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The IETF is nearing completion of a new protocol aimed at minimizing prob- 
lems resulting from misconfigured network devices, thereby easing part of the 
systems administration load. This XML-based protocol, NetConf, is intended to 
reduce the programming effort required to automate device configuration. 

According to the NetConf Working Group Web site, “configuration of 
networks of devices has become a critical requirement for operators in today’s 
highly interoperable networks. Operators from large to small have developed 
their own mechanisms or used vendor-specific mechanisms to transfer config- 
uration data to and from a device, and for examining device state information 
which may impact the configuration. Each of these mechanisms may be dif- 
ferent in various aspects, such as session establishment, user authentication, 
configuration data exchange, and error responses.” 

The Netconf Working Group aims to produce a protocol suitable for network 
configuration that includes the following characteristics (taken from the Web site): 


¢ Provides retrieval mechanisms that can differentiate between configuration 
data and non-configuration data. 

¢ Is extensible enough that vendors will provide access to all configuration 
data on the device using a single protocol. 

¢ Has a programmatic interface (avoids screen scraping and formatting-related 
changes between releases). 

¢ Uses a textual data representation that can be easily manipulated using non- 
specialized text manipulation tools. 

¢ Supports integration with existing user authentication methods. 

¢ Supports integration with existing configuration database systems. 

¢ Supports network wide configuration transactions (with features such as 
locking and rollback capability). 

¢ Is as transport-independent as possible. 

* Provides support for asynchronous notifications. 


For more details, please refer to the NetConf Working Group Web site at: 
http://www. ietf.org/htm].charters/netconf-charter.html 


Also this month, Sys Admin magazine and the SANS Institute are working 
together to gather the data for the annual salary survey from SANS, and we 
need your help to be sure the survey represents as much of the community as 
possible. We would appreciate it if you would take a few minutes to complete 
the 2005 Information Security Career Advancement Survey form online at: 


http://www.sans.org/2006CareerSurvey 


After all the data has been collected and tallied, we will publish the results of 
the salary survey in an upcoming issue of Sys Admin. 

As always, if you an idea for an article, please send a proposal to Rikki 
Endsley at: rends|ley@cmp.com. And, if you have any comments or suggestions, 
please don’t hesitate to contact me at: aankerhol z@cmp.com. 


Sincerely yours, 


Amber Ankerholz 
Editor in Chief 
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Ryan Matteson 


ith the recent release of the Solaris 10 Operating System, Sun 
unleashed the Dynamic Tracing Facility (DTrace) on the 
world. Dynamic tracing can be used by administrators, 
developers, and quality engineering to track down reliability and per- 
formance problems, correlate disparate events, and observe applications 
and the Solaris kernel in ways that have never before been possible. 

The DTrace framework works by integrating several providers 
into the Solaris kernel. Each provider contains a set of probes, 
which are locations in the kernel where actions (e.g., capture a stack 
trace, record an argument to a function, etc.) can be applied when 
the probe is triggered. Probes are enabled programmatically through 
libdtrace or through the dtrace(1m) command-line utility, which 
is preferred. 

DTrace comes bundled with a language called “D”. D is a script- 
ing language that is well-suited for developing tools to observe 
application resources and system behavior. The DTraceToolkit is a 
collection of D scripts that includes tools to view the virtual mem- 
ory subsystem, network utilization, CPU activity, locking behavior, 
and the I/O subsystem. In this article, Ill introduce the 
DTraceToolkit and demonstrate how to use the Toolkit to analyze 
I/O behavior on a Solaris server. 


DTraceToolkit Installation 

The DTraceToolkit is available as a compressed tar archive and 
can be installed to your favorite directory location with the wget and 
gtar utilities: 


$ wget -q -0 - http://www.brendangregg.com/ \ 
DTraceToolkit-latest.tar.gz | gtar xz 


All of the scripts in the DTraceToolkit are organized into directories 
that identify their purpose (e.g., Disk, Memory, etc.) and are linked 
into a Bin directory for easy access: 
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$ 1s -la 

total 304 

drwxr-xr-x 34 matty matty 1156 Jul 26 01:51 . 

drwx------ 12 matty matty 

drwxr-xr-x 84 matty matty 2856 Jul 26 01:43 Bin 

drwxr-xr-Xx 9 matty matty 306 Jun 28 00:24 Cpu 

drwxr-xr-X 9 matty matty 306 Jul 25 06:40 Disk 

drwxr-xr-x 16 matty matty 544 Jul 26 01:44 Docs 

drwxr-xr-x 3 matty matty 102 Jun 11 08:52 Extra 

l matty matty 2392 Jul 26 01:43 Guide 

drwxr-xr-X 9 matty matty 306 Jun 13 15:32 Kernel 

Trwxr-xr-x 1 matty matty 14 Aug 7 21:23 License -> Docs/cdd11.txt 
4 
3 


“Wo Reo Pe 


drwxr-xr-Xx matty matty 136 May 16 03:32 Locks 
drwxr-xr-Xx matty matty 102 Jun 13 08:36 Man 

drwxr-xr-x 12 matty matty 408 Jul 25 02:57 Mem 

drwxr-xr-x 10 matty matty 340 Jul 25 09:06 Net 

drwxr-xr-x 25 matty matty 850 Jul 25 13:36 Proc 

Trwxr-xr-x matty matty 5 Aug 7 21:23 README -> Guide 
drwxr-xr-x 7 matty matty 238 Jul 25 11:05 System 


drwxr-xr-X 4 matty matty 136 May 15 04:16 User 


Pwo ope matty matty 40 Jul 26 01:51 Version 
drwxr-xr-x 4 matty matty 136 Jul 26 01:41 Zones 
-rwxr-xr-x | matty matty 7143 Jul 23 07:51 dappprof 
~PWXP-XP-X matty matty 7626 Jul 23 07:52 dapptrace 
~PWXP-XP-X matty matty 13576 Jul 23 07:56 dexplorer 
-PWXP-XP7X matty matty 13461 Jul 23 08:02 dtruss 
-PWXP-XP-X l matty matty 6224 Jul 23 08:03 dvmstat 
-PWXP-XP-X l matty matty 4356 Jul 23 08:04 errinfo 
-PWXP-XP>X matty matty 4883 Jul 23 08:05 execsnoop 
-PWXP-XP-X matty matty 4275 May 15 01:03 instal 


-PWXP-XP-X matty matty 11983 J 
-PWXP-XP-X 1 matty matty 10954 J 


ul 25 07:26 iosnoop 

ul 26 00:56 jotop 
~PWXP-XP-X matty matty 6189 Jul 23 08:34 opensnoop 
-PWXP-XP-X matty matty 6043 Jul 23 08:49 procsystime 
-PWXP-XP>X matty matty 5483 Jul 23 21:46 rwsnoop 
-PWXP-XP-X matty matty 6514 Jul 23 14:51 rwtop 


A large number of the scripts contain a “-h” (help) option, which 
prints a short description of the available flags and options: 


$ iopattern -h 
USAGE: jopattern [-v] [-d device] [-f filename] [-m mount_point] 
{interval Ccount]] 


“Vv # print timestamp 
-d device # instance name to snoop 
-f filename # snoop this file only 
-m mount_point # this FS only 
eg, 
jopattern 
jopattern 10 
jopattern 5 12 
jopattern -m / 


# default output, 1 second samples 
# 10 second samples 

# print 12 x 5 second samples 

# snoop events on filesystem / only 
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The README file in the top-level directory provides a description 
of each directory, and the Docs/Contents file contains a description 
of each script. Now that we know how the toolkit is laid out, let’s 
put this bad boy to work! 


Monitoring Physical I/O Activity 

One of the biggest challenges we face as systems administrators 
is identifying I/O performance problems. I/O problems are often 
caused by improper tuning, over-utilized disk resources, and mem- 
ory shortages. When I/O performance problems surface, they usually 
lead to application latency and reduced database throughput. The 
DTraceToolkit comes with the iosnoop script, which can be used to 
monitor physical I/O operations, measure I/O latency, and correlate 
physical J/O activity to individual processes: 


$ josnoop -e 

DEVICE ID PID D BLOCK SIZE COMM PATHNAME 

dadl 100 522 R 2821088 8192 bash /usr/bin/su 

md0 100 522 R 2821088 8192 bash /usr/bin/su 

dado 00 522 R 332144 8192 su /1ib/libbsm.so.1 

md0 00 522 R 332144 8192 su /lib/libbsm.so.1 

dadl 00 522 R 288484 6144 su /lib/libmd5.so.1 

md0 00 522 R 288484 6144 su /lib/libmd5.so.1 

dadl 00 522 R 334944 8192 su /lib/libmp.so.2 

md0 100 522 R 334944 8192 su /lib/libmp.so.2 

dadl 25 527 R 10114400 8192 sendmail /usr/1ib/sendmail 

md0 25 527 R 10114400 8192 sendmail /usr/1ib/sendmail 

dad0 25 527 R 10114160 8192 sendmail /usr/1lib/sendmail 

md0 25 527 R 10114160 8192 sendmail /usr/1ib/sendmail 

dadl 25 527 R 10114448 8192 sendmail /usr/1ib/sendmail 

md0 25 527 R 10114448 8192 sendmail /usr/lib/sendmai 

When the “-e’” (print device name) is passed to iosnoop, the device 
name to which the I/O operation is targeted will be printed in column 


1. The second and third columns contain the uid and process id of the 
process that issued the read or write operation. The fourth column con- 
tains an “R” for a read operation or a “W” for a write operation. The 
fifth column displays the starting logical block in the file system where 
data is being read or written, and the sixth column displays the size of 
the read or write operation. The seventh column displays the name of 
the executable that performed the read or write operation, and the last 
column contains the file name that is being read or written from. 
When iosnoop is invoked with the “-o” (print disk delta time, us) 
option, the number of microseconds required to service each physical 
read and write operation will be displayed in the “DTIME” column: 


$ josnoop. -e -o 
DEVICE DTIME UID PID D BLOCK SIZE COMM PATHNAME 


sd5 555 101 377 W 6927040 8192 oracle /u03/oradata/proddb/redo03.10g 

sd6 8618 101 377 W 6927040 8192 oracle /u03/oradata/proddb/redo03. 10g 

md120 10967 101 377 W 6927040 8192 oracle /u03/oradata/proddb/redo03. 10g 

sd1 12300 101 460 R 14771248 8192 oracle /u0l/app/oracle/product/ \ 
10.0.2.0/bin/oracle 

sd2 12712 101 460 R 14771248 8192 oracle /u0l/app/oracle/product/ \ 
10.0.2.0/bin/oracle 

sd3 9229-101 460 W 21360480 8192 oracle /u02/oradata/proddb/temp01.dbf 

sd4 12442 101 460 W 21360480 8192 oracle /u02/oradata/proddb/temp01.dbf 

md110 13751 101 460 W 21360480 8192 oracle /u02/oradata/proddb/temp01.dbf 


The iosnoop script will continuously display physical read and write 
operations for all processes on a system, which can often overwhelm 
a terminal screen on a busy server. To limit the output to a single 
process (possibly one that is causing problems), the process name can 
be passed to iosnoop’s “-n” (this process name only) option: 
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$ iosnoop -a -n sshd 
STRTIME DEVICE MAJ MIN UID PID D BLOCK SIZE PATHNAME ARGS 
2005 Aug 28 16:41:50 dad] 136 8 0 750 W 3435040 3072 /var/adm/lastlog 
/usr/1ib/ssh/sshd 
512 <none> 
/usr/1ib/ssh/sshd 
2005 Aug 28 16:41:50 dadl 136 8 0 750 W 5259776 8192 /var/adm/wtmpx 
/usr/1ib/ssh/sshd 
2005 Aug 28 16:41:57 dad] 136 8 0 761 W 3435040 3072 /var/adm/lastlog 
/usr/lib/ssh/sshd 
2005 Aug 28 16:41:57 dadl 136 8 0 761 W 2993204 9216 <none> 
/usr/lib/ssh/sshd 
2005 Aug 28 16:42:02 dadl 136 8 0 770 W 3435040 3072 /var/adm/lastlog 
/usr/1ib/ssh/sshd 
2005 Aug 28 16:42:02 dadl 136 8 0 770 W 2993222 1024 <none> 
/usr/1ib/ssh/sshd 
2005 Aug 28 16:42:08 dadl 136 8 0 692 W 2832720 4096 /var/adm/utmpx 
/usr/lib/ssh/sshd 


2005 Aug 28 16:41:50 dadl 136 8 0 750 W 2993203 


The continuously scrolling display is useful for capturing data over 
long periods of time. When a point-in-time snapshot is required, the 
DTraceToolkit’s iotop script can be used to view physical I/O activ- 
ity in a “top”-like display: 


$ jotop 2 


2005 Aug 13 11:36:43, load: 0.16, disk_r: 96 Kb, disk_w: 109890 Kb 


UID PID  PPID CMD DEVICE MAJ MIN D BYTES 
0 0 0 sched dad0 36 7 W 512 
0 3 0 fsflush dad0 136 7 W 512 
0 3 0 fsflush dadl 136 15 W 512 
0 0 0 sched dadl 36 15 W 512 
0 3 0 fsflush dad1 136 8 W 3072 
0 3 0 fsflush md0 8 OW 3072 
0 3 0 fsflush dad0 36 (OW 3072 
100 502 484 mkfile dad0 36 7 W 3584 
100 502 484 mkfile dadl 136 15 W 3584 
0 501 498 dtrace dadl 36 68 R 8192 
0 501 498 dtrace dad0 36 OR 8192 
0 501 498 dtrace md0 85 OR 16384 
100 502 484 mkfile dad 36 68 R 16384 
100 502 484 mkfile dad0 136 OR 16384 
100 502 484 mkfile md0 85 OR 32768 
100 502 484 mkfile md0 85 OW 34881536 
100 502 484 mkfile dad1 36 8 W 34881536 
100 502 484 mkfile dad0 36 (OW 34881536 


The output of iotop will be refreshed every five seconds by default. 
To adjust this interval, a positive integer value can be passed as an 
option to iotop. Because the iosnoop and iotop scripts were 
designed to monitor physical I/O activity, reads serviced by the page 
cache and operations targeted at pseudo-devices will not be dis- 
played by iosnoop and iotop. The section “Monitoring Read and 
Write Activity” will describe two additional scripts that can be used 
to provide a holistic view of read and write activity, which can be 
useful for viewing accesses to pseudo-devices and cached data. 


Monitoring Read and Write Activity 


When a process needs to read from or write to a file or device, 
the process will invoke one of the kernel’s read or write system 
calls. The read or write may occur to a file, pseudo-device, or to one 
of the physical devices attached to the system. The rwsnoop script 
can capture read and write activity, which can be helpful for moni- 
toring I/O activity when file system caches are in use and to view 
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accesses to pseudo-devices and virtual file systems that reside in 
memory. To view read and write operations along with the process 
that generated the request, the rwsnoop script can be invoked: 


$ rwsnoop 
UID PID CMD D BYTES FILE 
10 379 oracle W 16384 /u0l/oradata/proddb/control01.ct 
0 379 oracle W 16384 /u0l/oradata/proddb/control02.ctl 
0 379 oracle W 16384 /u0l/oradata/proddb/control03.ct 
0 460 oracle W 1048576 /u02/oradata/proddb/tbs01.dbf 
01 460 oracle W 1048576 /u02/oradata/proddb/rbs01.dbf 
0 460 oracle W 1048576 /u02/oradata/proddb/tbs01.dbf 
01 377 oracle W 1024 /u03/oradata/proddb/redo03.10g 
10 377 oracle W 1024 /u03/oradata/proddb/redo03.1og 
0 371 oracle R 416 /proc/371/psinfo 
0 371 oracle R 416 /proc/373/psinfo 
01 371 oracle R 416 /proc/375/psinfo 


Rwsnoop will display which process is performing the read or write 
operation, the size and type (read or write) of that operation, and the 
file or device the process is reading or writing to. Like iosnoop, the 
output will be continuously displayed on the terminal screen. To 
view a summary of read and write activity in a “top’-like display, 
the rwtop script can be used: 


$ rwtop 2 

2005 Jul 24 10:47:26, load: 0.18, app_r: 9 Kb, app_w: 8 Kb 
UID PID PPID CMD i) BYTES 

100 922 920 bash R 3 
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100 922 920 bash W 15 
100 902 899 sshd R 1223 
100 926 922 Is R 1267 
100 902 899 sshd W 1344 
100 926 922 Is W 2742 
100 920 917 sshd R 2946 
100 920 917 sshd W 4819 

0 404 1 vxsvc R 5120 


The screen will be refreshed every five seconds by default, which 
can be adjusted by passing a new refresh interval to rwtop. 


1/O Size Distributions 


As systems administrators, we are often asked to identify the 
size of I/O operations issued by each application. This information 
is used by systems and application administrators to tune file system 
block and cluster sizes, SCSI and LVM transfer sizes, and RAID 0 
stripe widths to match application I/O patterns. 

Before the introduction of the DTraceToolkit, the vxtrace(1m), 
prex(Im), and truss(1m) utilities were the best methods to find 
this information. The output from these utilities was often 
extremely verbose, and the process of correlating data and deter- 
mining I/O size distributions was cumbersome and prone to error. 

Now that the DTraceToolkit is available, this information can 
be easily determined by running the bitesize.d script. bitesize.d will 
determine I/O size distributions for all processes in a Solaris sys- 
tem and will print a histogram with the number of I/O operations 
displayed by size: 


your problem 


Si 


$ bitesize.d 4096 0 
Sampling... Hit Ctrl-C to end. 8192 0 
16384 0 
377 ora_lgwr_proddb 32768 0 
65536 0 
ValUG: e202 eeerse-- Distribution ------------- count 131072 |@@@CQ@COOHOORERORORERRRRORROACAACAARGAGO 76947 
256 0 262144 0 
512 |@@e@eeacaeeee 604 
1024 2 This display shows the output from the Oracle logwriter and a data- 
2048 0 base load process. The histogram shows that the Oracle log writer 
4096 2 was using predominantly 8k-16k I/O sizes. The database loader, 
8192 |@aaeaaaooooaaaaaaaaaaaaae 1218 on the other hand, was requesting 1-MB chunks (128 pages * 8k 
16384 |@@ 102 per page), but we can see from this display that the majority of the 
32768 0 V/O operations were 128k-256k in size. This was caused by the 
65536 0 md_maxphys value, which is the largest default I/O size that can be 
131072 12 performed to an SVM volume. 
262144 0 
I/O Access Patterns 
7312 dbload pages=128 When deploying new applications to Solaris systems, it is essen- 
tial to understand how an application accesses data. Access patterns 
value ------------- Distribution ------------- count can be random or sequential and often vary in size. Understanding 
16 0 access patterns can help when tuning direct I/O and caching attributes 


32 2 and are invaluable for finding the optimal block sizes for applications. 

64 0 Before the introduction of DTrace, an administrator was 

128 0 required to discuss access patterns with vendors and development 

256 0 teams or parse through mounds of truss(1m) and vxtrace(1m) 
2 data to estimate how applications access files and volumes. The data 
0 was often skewed, required a good deal of time to derive interesting 
0 


metrics, and was often prone to error. 
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Now that DTrace is able to capture data throughout the kernel, the 
job of finding access patterns has been greatly simplified. The 
DTraceToolkit includes the iopattern and seeksize.d scripts, which can 
both be used to display per-process and system-wide access patterns. 
To get system-wide access patterns, the iopattern script can be used: 


$ iopattern 

ZRAN %SEQ COUNT MIN MAX AVG KR KW 
95 5 01 512 131072 30649 ©2792 231 
60 40 174 512 131072 61557 951 9509 
58 42 87 512 131072 35369 2232 4227 
53 47 229 512 131072 65880 3992 10741 
42 58 70 512 131072 89798 5328 9580 
67 33 90 512 131072 52135 3864 5809 
80 620 95 512 131072 6516 = 1207 34 
8416 224 512 24576 7860 1574 145 
89 «11 199 1024 8192 7682 1481 12 


Iopattern provides the percentage of random and sequential I/O, 
the number and total size of the I/O operations performed during 
this sample period, and it provides the minimum, maximum, and 
average I/O sizes. To get the I/O distribution for each process, the 
seeksize.d script can be used: 


$ seeksize.d 
Sampling... Hit Ctri-C to end. 
AG 


PID CMD 
7312 dd if=/dev/dsk/cltld0s2 of=/dev/null bs=1048576 


WELW soar eeeseesse Distribution <+*-s-=--->-= count 
1 | 0 
0 |@a@eeaa@eeeaaoccaa@oaa@oeoaa@oaaa@@aaa@e 1762 
1 | 0 
PID CMD 
377 ora_dbw0_proddb 
VALUES Aeee ee eesee es Distribution ----+-s*-s+= count 
-8388608 0 
4194304 3 
2097152 28 
-1048576 44 
524288 95 
262144 102 
-131072 166 
-65536 |@ 363 
-32768 |@@ 585 
-16384 |@ 328 
-8192 141 
-4096 105 
-2048 57 
-1024 35 
-512 34 
256 40 
-128 31 
-64 35 
-32 47 
16 22 
-8 6 
-4 20 
-2 0 
il 0 
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0 |@eceeec@eaee 4164 
1 0 
2 |@ee@ 926 
4 |@ 208 
8 |@eeaee 2148 
16 |@ 440 
32 |@ 410 
64 |@ 318 
128 |@ 225 
256 |@ 190 
512 148 
1024 139 
2048 155 
4096 168 
8192 |@ 183 
16384 |@ 377 
32768 |@@ 657 
65536 |@ 406 
131072 |@ 199 
262144 151 
524288 lll 
1048576 37 
2097152 14 
4194304 3 
8388608 0 


Seeksize.d measures the seek distance between consecutive reads and 
writes and provides a histogram with the number of sectors traveled 
between consecutive read or write operations. Applications that read 
or write data in a sequential fashion will contain a small distribution 
(e.g., dd in the example above). Applications that read or write data in 
a random fashion will cause the disk heads to seek between read and 
write operations, which will cause a wide distribution (e.g., the Oracle 
database writer in the example above) in the seeksize.d output. 


Conclusions 


With the continued growth in storage capacity and complexity, it 
is important to have tools to analyze and debug I/O problems. This 
article introduced several scripts in the DTraceToolkit that can be 
used to easily observe I/O behavior in ways that were previously 
difficult or impossible. Because Solaris 10, DTrace, and the 
DTraceToolkit are free to download and use, truly understanding 
your applications is just a few clicks away. 


References 

DTrace Documentation — 
http://www.sun.com/bigadmin/content/dtrace 

DTraceToolkit — http://www.opensolaris.org/os/ \ 
community/dtrace/dtracetoolkit 

Solaris 10 — http: //www.sun.com/download 


Acknowledgments 


Ryan thanks Brendan Gregg for reviewing this article and writing 
the DTraceToolkit. Ryan would also like to thank the DTrace devel- 
opers for their awesome contribution to system and application 
observation. 


Ryan Matteson works as a systems engineer, and specializes in Web tech- 
nologies, SANs, and the OpenBSD, Linux, and Solaris operating systems. 
When Ryan isn’t busy working, he enjoys playing guitar and maintaining his 
blog at: daemons .net/~matty. Questions and comments about this article 
can be addressed to: matty@daemons .net. 


December 2005 


oECURITY 


security Forensics Using DTrace 


Boris Loza, PhD 


olaris 10 has introduced a new tool for Dynamic Tracing in 

the OS environment — dtrace. This is a very powerful tool 

that allows systems administrators to observe and debug 
the OS behavior or even to dynamically modify the kernel. 
Although this tool has been designed primarily for developers 
and administrators, in this article, I will explain how to use dtrace 
as a security forensics tool for analyzing suspicious files and 
processes. 


Using DTrace 


Dtrace has its own C/C++-like programming language called 
“D language” and comes with many different options. Dtrace is 
easy to use, once you familiarize yourself with D language and 
know a little bit about Solaris internals. For example, the follow- 
ing construction using dtrace could 


called a “probe” and defines a location or activity to which 
dtrace binds a request to perform a set of “actions”. The 'syscal1' 
element of the probe is called a “provider”, and, in this case, permits 
probes on ‘entry’ (start) to any 'open' Solaris system call (‘open' 
syscall is sent to file when it is about to be opened). The so-called 
“predicate” (/pid == 968/) uses the predefined dtrace variable 
‘pid’, which always evaluates to the process ID associated with 
the thread that fired the corresponding probe. 

The 'execname' and 'copyinstr(arg0)' are called “actions” 
and define the name of the current process executable file and con- 
vert the first integer argument of the system call (argQ) into a 
string format, respectively. The printf's action uses the same 
syntax as in C language and serves the same purpose — to format 
the output. 

Each D program consists of a 


serve as a “primitive” strace com- 
mand for the process ID 510: 

probe descriptions: 
/ predicate / 

{ 

I will explain the syntax of such action statements 
constructions in a moment. } 

To understand how to use dtrace 
as a security forensic tool, I will 
present a case study, as follows. 
Let’s assume that we are going to 
investigate the process ID 968 from 
the suspicious “srg” application that 
we found running on our system. 

Typing the following at the com- 
mand line will list all files that this 
process opens at the time of moni- 
toring. We’ll let this run for a while 
and terminate with Control-C: 


# dtrace -n syscall:::'/pid = 510/ {}' 


# dtrace -n syscall::open:entry'/pid == 968/ 
{ printf("%s%s",execname,copyinstr(arg0)); }' 


dtrace: description 'syscall::open*:entry' matched 2 probes 
AC 


CPU 1D FUNCTION: NAME 
0 14 open:entry srg /var/1d/1d.config 
0 14 open:entry srg /1ib/libdhcputil.so.1 
0 14 open:entry srg /lib/libsocket.so.1 
0 14 open:entry srg /1ib/libnsl.so.1 


D language comes with its own terminology, which I will address 
here briefly. The whole 'syscall::open:entry' construction is 
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series of clauses, with each clause 
describing one or more probes to 
enable and an optional set of 
actions to perform when the probe 
fires. The actions are listed as a 
series of statements enclosed in 
curly brackets { } following the 
probe name. Each statement ends 
with a semicolon (;). 

In general, each probe clause has 
the general form: 


/* Print data */ 
syscall::write:return, 


syscall::pwrite:return, 
syscall::*read*:return 
/self->starv/ 


probe descriptions: 
/ predicate / 

{ 

action statements 

} 


You may want to read the dtrace(1M) 
man pages and the Introduction to the Solaris Dynamic Tracing 
Guide at: 


http://docs.sun.com/app/docs/doc/817-6223 


for more options and further explanation of the syntax. 

As the name suggests, the dtrace (Dynamic Trace) utility will 
show you the information about a changing process — dynami- 
cally. That is, if the process is idle (doesn’t do any system calls or 
opens new files), you won’t be able to get any information. To 
analyze such a process, either restart it or use “static” methods 
and utilities, such as mdb(1), or process analyzing commands, 
such as pfiles(1). 
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Next, we will use the following command-line construction to 
list all system calls for “srg”. Again, we'll let this run for a while 
and terminate it with Control-C: 


# dtrace -n ‘syscall:::entry /execname == "srg"/ \ 

{ @num{probefunc] = count(); }" 
dtrace: description 'syscall:::entry ' matched 226 probes 
“¢ 

pollsys 

getrlimit 

connect 

setsockopt 


ne ee 


You may recognize some of the building elements of this small D 
program. Additionally, this clause defines an array named 'num' 
and assigns the appropriate member 'probefunc' (executed system 
call’s function) the number of times these particular functions have 
been called (count()). 

Using dtrace, we can easily emulate utilities traditionally being 
used to analyze suspicious binary files and processes. However, 
dtrace is a much more powerful tool and may provide more func- 
tionality; for example, you can dynamically monitor the stack of the 
process in question: 


# dtrace -n ‘syscall:::entry/execname == "srg"/{ustack()}' 

0 286 lwp_sigmask:entry 
libc.so.1'__systemcal16+0x20 
libc.so.1'pthread_sigmask+0x1b4 
libe.so.1l'sigprocmask+0x20 
srg'srg_alarm+0x134 
srg'scant+0x400 
srg'net_read+0xc4 
srg'maint+Oxabc 
srg'_start+0x108 


Based on all our investigation (see the list of opened files, syscalls, 
and the stack examination above), we may positively conclude that 
srg is a network-based application. Does it write to the network? 
Let’s check by constructing the following clause: 


# dtrace -n 'mib:ip::/execname == "srg"/{@[execname ]=count()}' 
dtrace: description 'mib:ip::' matched 412 probes 
dtrace: aggregation size lowered to 2m 
AC 
srg 520 
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It does. We used a 'mib' provider to find out whether this applica- 
tion transmits to the network. 

Could it be just a sniffer or a netcat-like application that is 
bounded to a specific port? Let’s run dtrace in the truss (1) fashion 
to answer this question (inspired by Brendan Gregg’s dtruss utility 
from users.tpg.com.au/adsIn4yb/DTrace/shellsnoop): 


#!/usr/bin/sh 


# 
dtrace=" 
inline string cmd_name = "'$1'"; 
/* 
** Save syscall entry info 
*/ 


syscall:::entry 
/execname == cmd_name/ 


{ 
/* set start details */ 
self->start = timestamp; 
self->argQ = arg0; 
self->argl = argl; 
self->arg2 = arg; 

} 


/* Print data */ 
syscall::write:return, 
syscall::pwrite:return, 
syscall: :*read*:return 
/self->start/ 
{ 
printf ("%s(Ox%X, \"%S\", Ox%X)\t\t = %d\n", probefunc,self->arg0, 
stringof(copyin(self->argl,self->arg2)),self->arg2,(int)arg0); 


self->arg0 = argQ; 
self->argl = argl; 
self->arg2 = arg2; 


} 


# Run dtrace 
/usr/sbin/dtrace -x evaltime=exec -n "$dtrace" >&2 


Save it as truss.d, change the permissions to executable, and run 


# ./truss.d srg 

0 13 write:return write(0xl, " soll0 -> 192.168.2.119 
TCP D=3138 S=22 Ack=713701289 Seq=3755926338 Len=0 
Win=49640\n8741 Len=52 Win=16792\n\0", 0x5B) = 91 

0 13 0 13 write:return write(0xl, 
"192.168.2.111 -> 192.168.2.1 UDP D=1900 $=21405 LEN=140\n\0", 
0x39) = 57 

“Cc 


Looks like a sniffer to me, with probably some remote logging 
(remember the network transmission by ./srg discovered by the 
'mib' provider above). You can write pretty sophisticated programs 
for dtrace using D language. Take a look at /usr/demo/dtrace for 
some examples. 

You may also use dtrace for other forensic activities. The follow- 
ing is an example of a more complex script that allows monitoring 
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of who fires the suspicious application and records all the files 
opened in the process: 


#!/usr/bin/sh 

command=$1 

/usr/sbin/dtrace -n ' 

inline string COMMAND = "'$command'"; 
#pragma D option quiet 

/[* 

** Print header 

of 


dtrace::: 
{ 


BEGIN 


/* print headers */ 
printf("%-20s %5s %5s %5s %s\n","START_TIME","UID","PID", \ 
"PPID","ARGS"); 


/* 

** Print exec event 

*] 

syscall::exec:return, syscall:: 
/ (COMMAND == execname)/ 

{ 


exece:return 


/* print data */ 

printf("%-20Y %5d %5d %5d 
4s\n",walltimestamp,uid,pid,ppid, 

stringof(curpsinfo->pr_psargs)); 


s_pid = pid; 

} 

/* 

** Print open files 1 

*/ 

syscall: :open*:entry 

/pid == s_pid/ 

{ 
printf("%s\n", copyinstr(arg0)); 

} 


Save this script as a wait.d, change the per- 
missions to executable 'chmod +x wait.d' 
and run: 


# ./wait.d srg 


START_TIME UID PID PPID ARGS 
2005 May 16 19:51:20 0 1582 1458 
./srg 


/var/1d/\d.config 
/lib/libnsl.so.1 
/lib/libsocket.so.1 
/lib/libresolv.so.2 


AC 
Once the srg is started, you will see the 


output. 
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0. Need to upgrade existing equipment. 
9. You have decommissioned equipment to sell. 
8. Would like to trade for other equipment. 
7. Looking for a short or long term lease. 
6. Need more servers and storage. 
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The real power of dtrace comes from the fact that you can do 
things with it that otherwise aren’t possible without writing a com- 
prehensive C program. For example, the shel]snoop application 
written by Brendan Gregg allows you to use dtrace at the capacity of 
ttywatcher; see: 


http://users.tpg.com.au/ads]n4yb/DTrace/shel1snoop 


It is not possible to show all the capabilities of dtrace in this arti- 
cle. Dtrace is a very powerful as well complex tool with virtually 
endless capabilities. Although Sun insists that you don’t need a 
“deep understanding of the kernel for DTrace to be useful”, 
knowledge of Solaris internals is a real asset. Looking at the 
include files in /usr/include/sys/ directory may help you write 
complex D scripts and give you more of an understanding of how 
Solaris 10 is implemented. 


Conclusion 


When monitoring your systems, be creative and observant. 
Apply all your knowledge and experience for analyzing suspicious 
binary files and processes. Also, be patient, have a sense of humor, 
and learn how to use dtrace. 


Boris Loza, PhD, CISSP is an author of UNIX, Solaris and Linux: A 
Practical Security Cookbook, which deals with securing UNIX OS without 
any third-party applications. Boris is also a contributor to several industry 
magazines and has been quoted in many information security books and 
Web sites. He loves nature, reading books, watching movies, and enjoys 
scuba diving and entomology. 
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aintaining large numbers of systems over time can cause 
Mi (and sometimes large!) differences in the system 
configurations. Some of these differences may occur in 
the system’s patch levels. Perhaps an ad hoc patch for the develop- 
ment environment never made it to production or space issues pre- 
vented a patch from getting installed. Regardless of the reason for 
these differences, it’s good to have a way to measure the accuracy of 
your configurations. When issues arise, knowing these differences 
will aid you in troubleshooting problems. 
A related patching issue for Solaris admins is finding necessary 
prerequisite patches (e.g., patch X needs patch Y, which in turn 


Simplifying Solaris™ Patches 


revision, and which are not installed at all. The report can categorize 
patches in the following ways: recommended, security, software- 
related, and obsolete. It can be generated in plain text or HTML. 
This software is no longer available, however, and Sun suggests 
that you transition to Patch Manager instead. 

Sun Patch Manager 2.0, which grew out of the older PatchPro 
program, is the primary tool for managing patches on Solaris sys- 
tems. It is available for both Solaris 8 and 9, but some features do 
not work on Solaris 8. It provides both a Web browser interface 
(Solaris 9 only) and a command-line interface that can analyze a 
system to determine missing patches. It also can be used to down- 
load patches and apply or remove 


needs patch Z). This process can 
be both time consuming and frus- 
trating. I have encountered both 
these problems, and I have worked 
on solving them with two simple 
Perl scripts. In this article, Ill 
show how to use both of the scripts 
as well as provide the source code 
so others can benefit from them. I 
will also briefly discuss some Sun 
tools that can complement these 
Perl scripts. 


Patch Information 

These scripts are made possi- 
ble by a file that Sun already pub- 
lishes daily. This file, called 
patchdiag.xref, is available from 


patches to your systems. 


Scripts 


Sun recognized the problems 
different patch levels can cause and 
produced a small, free utility called 
Patch Comparison Utility (PCMP). 
The PCMP compares the output of 
showrev, identifies the uncommon 
patches between the two files, and 
produces an output report. Even 
though PCMP is a great tool, it has 
a few drawbacks. First, it is 
released as a binary, so changes 
and improvements can not be 
made. Second, it only compares 
two servers at a time. And last, it 


SunSolve and lists every patch 
Sun has released along with related information. That information 
includes the latest minor revision, flags indicating whether it is rec- 
ommended or security related, which version of the OS it is for, 
necessary prerequisite patches, release date, patch architecture, 
packages affected, and a simple description. 

Here’s a sample line from patchdiag.xref. It’s all in one line, 
wrapped here for clarity. The file can be hard to read but you can see 
some of the fields I described: 


117702|01|Sep/01/04| ||||8|sparc;108528-29; |SUNWcs1:11.8.0, \ 
REV=2000.01.08.18.12;SUNWcs1x:11.8.0,REV=2000.01.08.18.12; \ 
|Sun0S 5.8: scsi plugin patch 


Chris Josephes has written an excellent set of Perl modules, called 
sol-inst, to deal with installation information on Solaris systems. 
One module in particular deals with reading the information in the 
patchdiag.xref file. Unfortunately, the modules have not been 
updated since 2000, but I find that they are still useful. 

Sun Patch Check Tool (patchk.pl) is a free Perl script from 
Sun. Sys admins can run Sun Patch Check to get a report of which 
patches are installed, which are installed but not at the current 
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can be hard to find! It seems no 
longer to be available from Sun’s Web site, but it has been archived 
by a few people on various Web sites (see Resources). 

Those issues led me to build a new tool in Perl to compare 
patches (Listing 1; compare_patches.pl). This new version can com- 
pare patch levels on an arbitrary number of servers and is open so it 
can be improved. This script also can use the output of showrev or 
Sun’s patchk.pl tool to compare the patch levels (although patchk.pl 
is deprecated, I originally wrote the script while that tool was in 
use). The script’s structure is simple. It validates the input files and 
reads them into hash data structures in memory. Then it iterates over 
each patch on each server to check for differences in the minor revi- 
sions. It will then print the report output to the screen. Patches with 
differences will be marked. 

Another common problem in Solaris patching is finding the 
necessary prerequisite patches. To solve this problem, I wrote 
another Perl script, called find_reqs.pl (Listing 2), that uses Sun’s 
patchdiag.xref file to recursively find all needed patches for a 
single patch. 

The structure of find_reqs.pl is easy to follow. It scans the patch- 
diag.xref file and creates a data structure mapping patch ids to nec- 
essary prerequisites. Once that is done, it’s just a matter of doing a 
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recursive search starting with the patch of interest. Then it weeds 
out any duplicate patch references and prints an ordered list of 
patches. The output from find_reqs will be a correct order in which 
you can install the patches. 

Each script has a help option to explain the basic parameters and 
both scripts have a debug option to be more verbose while running. 
Find_reqs.pl has an option to specify a particular patchdiag.xref 
file; otherwise, the default is to look in the current directory. Also, 
compare_patches.pl has options to hide the header lines, to show 
only the patches with differences, or to switch the input files from 
showrev format to patchk format. 

Here is some sample output from compare_patches.pl. This 
example shows the comparison of three servers whose showrev -p 
output is listed in files serverl, server2, and server3. The output 
displays the patch id and the corresponding minor patch revisions 
on each server. The differences are marked with a “*” and are 
summarized at the end: 


% ./compare_patches.pl serverl server2 server3 


Patch ID || Rev # Rev # | Rev # | 
109134* || serverl: 03 | server2: 02 | server3: 02 | 
109135 || serverl: 04 | server2: 04 | server3: 04 | 
111873* || serverl: 01 | server2: N/A | server3: N/A | 
111874* || serverl: N/A | server2: 03 | server3: 01 | 
111879* || serverl: 00 | server2: 03 | server3: 03 | 
Different patches: 4 out of total 5 


Here is another output sample from the same data but only showing 
the differences and without the header line. The “noheader” option 
also omits the trailing summary line: 


% ./compare_patches.pl -diff -noheader serverl server2 server3 


109134* || serverl: 03 | server2: 02 | server3: 02 | 
11873* || serverl: 01 | server2: N/A | server3: N/A | 
11874* || serverl: N/A | server2: 03 | server3: 01 | 
11879* || serverl: 00 | server2: 03 | server3: 03 | 


Listing 1 compare_patches.pl 
#!/usr/bin/per 


# 

# Compare patches installed across Sun servers 
# NEPD Consulting 

# Paul Guglielmino <paulg@nepd.com> 

# 


if Please send improvements or comments to me! 


# 

# This program is free software; you can redistribute it and/or modify 
# it under the terms of the GNU General Public License as published by 
# the Free Software Foundation; either version 2 of the License, or 

## any later version. 

# 

# This program is distributed in the hope that it will be useful, 

# but WITHOUT ANY WARRANTY; without even the implied warranty of 

df MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

# GNU General Public License for more details. 

# 


use Getopt::Long; 
my($version) = "1"; 


my($want_only_diffs, $showrev, $patchk, $debug, $usage, $v, $noheader); 
Getopt::Long::Configure( ‘permute' ); 
GetOptions( ‘diffs' => \$want_only_diffs, 

"showrev' => \$showrev, ‘patchk' => \$patchk 

"debug' => \$debug, ‘help’ => \$usage, 

‘version’ => \$v, 'noheader' => \$noheader ); 


if ( $usage ) { print_usage(); exit 0; } 
if ( $v) { print_version(); exit 0; } 


dHHHHE 
my(@server_files, @server_patch_levels) = (); 
my(%server_patch_levels) = (); 


foreach my $datafile (@ARGV) { 
if ( |! -f $datafile ) { 
print "File $datafile not available, skipping\n"; 
} else { 
push( @server_files, $datafile ); 
} 
} 


if ( $#server_files <1) { 
print “Sorry need at least two files to analyze.\n"; 
print_usage(); 
exit 2; 

} 


# Assume showrev output unless told otherwise 
if ( $patchk 
$sub_ref = \&parse_patchk_file; 

} else { 
$sub_ref = \&parse_showrev_file; 
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# Load each file into memory 
for my $i (0..$#server_files) { 
$debug && print “Parsing $server_files[$i] ...\n"; 
open(F, "$server_files[$i]"); 
&$sub_ref(F,$i); 
close(F); 
} 


if ( ! $noheader ) { 
print_header(); 


my($patchdiffs) = 0; 
my(%count) = (); 


# Determine all patch ids in use by iterating over all servers and patches 
foreach my $s (keys %server_patch_levels) { 
foreach my $p (keys %{$server_patch_levels{$s}}) { 
$count{$p} = 1; 


} 


my($numpatches) = scalar keys %count; 
my(@all_patches) = keys %count; 


# Key off the patch id to see what level each server is at 
foreach my $p (@all_patches) 
my($linestring, $diff_string) = ""; 
my($diff_patch_flag) = 0; 


for my $i (0..$#server_files) { 
$linestring .= sprintf " %s: 23s |", $server_files($i], \ 
$server_patch_levels{$i}{$p} || "N/A"; 


or my $j (0..$#server_files-1) { 
if ( $server_patch_levels{$j}{$p} != $server_patch_levels{$j+l}{$p} ) { 
$diff_patch_flag = 1; 
} 


if ( $diff_patch_flag ) { 
$patchdiffs++; 
$diff_string .= "*"; 


if ( (! defined $want_only_diffs) || 
($diff_patch_flag && $want_only_diffs) ) { 
printf "%-6s%-2s || %s\n", "$p", $diff_string, $linestring; 
} 
i 


if ( ! $noheader ) { 
print "\nDifferent patches: $patchdiffs out of total $numpatches\n"; 
} 


exit 0; 
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The EtherDrive® SATA Storage Shelf is a 3U 
rack-mount network appliance that contains 15 SATA 
drive slots. Its triple redundant power supply protects 
you from your most likely failure. Its dual Gb Ether- 
net interfaces allow your data to go fast; 200MB per 
second. And at a very affordable price. List price for 
the EtherDrive Storage Shelf, without disks, is only 
$3,995. 

Our companion product, the RAIDBlade RAID 
controller, allows a virtually unlimited number of Stor- 
age Shelves to be combined into a set of logical AoE 
storage devices. 

Now you can have unlimited storage at a very 
affordable price. For complete information, visit our 
website at www.coraid.com, or call, toll-free, 1-877- 
548-7200. And we’ll show how we’ve made network 
storage so affordable, you can have all the space you 


want. 
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Listing 1 continued 


# Load patch information from "showrev -p" file 
sub parse_showrev_file { 
my($fh,$i) = @; 
while( <$fh> ) { 
df Match only the lines that have installed patches 
if ( /*Patch: (\d{6})-(\d{2})/ ) { 
$debug && print "Matched patch: $1 - $2\n"; 
# Load the patch and revision number into our hash if it: 
# Is not already there or if it is there with a lower 
# revision number 
if ( (1 defined $server_patch_levels{$i}{$1}) || 
($server_patch_levels{$i}{$1} < $2) ) { 
$server_patch_levels{$i} {$1} = $2; 
} 


} 
} 
} 


# Load patch information from Sun “patchk.pl" file 
sub parse_patchk_file { 
my($fh,$i) =@; 
while( <$fh> ) { 
if ( /*(\d{6}) (\d{2})/ ) { 
$debug && print "Matched patch: $1 - $2\n"; 
# Load the patch and revision number into our hash if it: 
# Is not already there or if it is there with a lower revision 
# number 
if ( (! defined $server_patch_levels{$i}{$1}) | 
($server_patch_levels{$i}{$1} < $2) ) { 
$server_patch_levels{$i}{$1l} = $2; 


} 
} 
# Print results header to the screen 
sub print_header { 
$line = "Patch ID || "; 
foreach $s (@server_files) { 
$len = length($s) - 1; 
$line .= "Rev #" . ""x$len . " ms 


print "$line\n"; 
} 


sub print_usage { 
print<<E0T; 
Show differences between patch levels on Solaris servers 


Usage: $0 [OPTIONS]... [FILES].. 


\t --help [This message] 

\t --debug [Show debugging output] 

Nt =-di ff {Show only differences] 

\t --noheader [Print header (default)] 

\t --showrev [Input files are in showrev -p format (default)] 
\t --patchk [Input files are in Sun patchk.pl format] 

\t --version [Print version number] 

E0T 

} 


sub print_version { 
print "\n$0: Version $version\n\n"; 


} 


Listing 2. find_reqs.pl 
#1 /usr/bin/perl 


# 

d# NEPD Consulting 

# Paul Guglielmino <paulg@nepd.com> 
# 


# Please send improvements or comments to me! 


# 

# This program is free software; you can redistribute it and/or modify 
## it under the terms of the GNU General Public License as published by 
## the Free Software Foundation; either version 2 of the License, or 

# any later version. 

# 

# This program is distributed in the hope that it will be useful, 

i but WITHOUT ANY WARRANTY; without even the implied warranty of 

if MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

df GNU General Public License for more details. 

# 


use strict; 
use warnings; 
use Getopt::Long; 


my($version) = "1"; 


my($xref, $debug, $usage, $v); 

Getopt::Long::Configure( ‘permute' ); 

GetOptions( 'xref=s' => \$xref, ‘debug' => \$debug, 
"help' => \$usage, ‘version’ => \$v ); 


if ( $usage ) { print_usage(); exit 0; } 
if ( $v) { print_version(); exit 0; } 


my($patch) = $ARGV[0]; 
if ( ! $patch ) { 
print_usage(); 
exit 25 
} 


# Holds all information on prerequisite patches. It is a hash of arrays. 
my(%prereqs, %patchrevs) = (); 


# Take cmd line options or supply reasonable defaults 


$debug = $debug || 0; 
my($patchdiag) = $xref || "patchdiag.xref" 
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# Load information from out Patchdiag file 
read_patchdiag($patchdiag); 


# Generate a list of all requires patches and remove the duplicates 
my(@required_patches) = find_allreqs($patch); 

@required_patches = do { my %seen; grep !$seen{$_}++, @required_patches }; 
print "| ", join(" ", reverse(@required_patches)), " |\n"; 


exit 0; 


# Use already defined structures to recurse through needed patches 
sub find_allreqs { 
my $1; 
my $p = $_[0]; 
my(@patches_to_check, @patch_list); 
if ( defined @{$prereqs{$p}} ) { 
$debug && print “Recursing down from: $p\n"; 
@patches_to_check = @{$prereqs{$p}}; 
push(@patch_list, $p); 
foreach $i (@patches_to_check) { 
push(@patch_list, find_allreqs(patchid($i))); 
i 
} else { 
$debug && print “Leaf patch: $p\n"; 
push(@patch_list, $p); 


return @patch_list; 


# Return just the major patch id 
sub patchid { 
return( (split(/-/, $_[0]))(0] ); 


# Return just the rev number 
sub patchrev 
return( (split(/-/, $_[0])){1] ); 


# Read the entire file, look out for comments and others things not 
# directly related to the patch information 
sub read_patchdiag { 

my($patchdiag_file) = $_[0]; 

my(@line, @prs, $x, $p); 


open(PD, "$patchdiag_file") || die "No $patchdiag found\n"; 
while( <PD> ) { 
TLC ANG Ne 
@line = split(/\|/); 
@prs = ($line[8] =~ /(\d{6}-\d{2};)+?/g); 
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Listing 2 continued 
$patchrevs{patchid($line[0])} = patchrev($line[0]); 


Show all required prerequisites for a Solaris patch. 


#print "$linel0] - @prs\n"; Usage: $0 [OPTIONS]... Patch Number 
foreach $p (@prs) { \t --help [This message] 
push( @{$prereqs{$line[0]}}, patchid($p) ); \t --debug [Show debugging output] 
} \t --xref=<filename> [Match patches against give xref file] 
} \t --version {Print version number] 
} 
close(PD); EOT 
} 
sub print_usage { sub print_version { 
print "$0: Version $version\n\n"; 
print<<E0T; } 


Here is some example output from running 
find_reqs.pl on kernel patch 108528. Patch 
108528 requires patches 108987, 111111, 
and 111310. Patch 108987 requires 
112396, and the others do not require any 
prerequisites: 


Stonehenge 


Worldwide Perl 
Improvements | Training, Consulting, 
For the compare_patches.pl script, I 
Reeie tes fle oe (hele a & Guru-on-Demand 


patches are at the latest revision would be 
very useful. Also, I would like the Perl 
scripts to run with warnings and strict 
turned on. Find_regs is there, but com- 
pare_patches needs a bit more tweaking. 
Future versions of these scripts can also be 
written to take advantage of functionality in 
the sol-inst modules. 


% ./find_reqs.pl - 
xref=/patches/patchdiag.xref 108528 
| 111310 111111 112396 108987 108528 | 


Resources 

Patchdiag.xef — Professional Trainers 
http://sunsolve.sun.com/patches 

Patch Comparison Utility (PCMP) — 
http://www.scn.rain.com/pub/ \ Randal Schwartz 
solaris/ = 

Sol-inst modules — brian d foy 
http://www.cpan.org/modules/ \ . 
by-category/ \ 
04_Operating_System_Interfaces/ \ Tom Phoenix 
Solaris/sol-inst-0.90a.tar.gz 

Sun Patch Manager — Tad McClellan 

http://www.sun.com/download/ \ 

products .xml?id=3f9d714b 
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Robert J. Bond 3rd 


s a systems administrator, I often have a need for a simple 
Ase application accessible through a Web interface. 

A utility called FormGen allows you to automatically 
generate such applications quickly and easily. You may need to 
create a database from scratch, or you may only need a Web inter- 
face into an existing database. FormGen handles either scenario 
with ease. 

FormGen is written in Perl and has been tested on Solaris, 
FreeBSD, and Linux. Out of the box, it uses the MySQL database 
but is easily modified to work with other Perl-supported SQL data- 
bases. FormGen was originally created in 2003 and was designed to 
automate the creation of very large HTML forms. In 2004, it was 
extended to output the SQL required to create the supporting 
table(s), and in 2005, it was extended to output the Perl code 
required to manage the supporting table(s). In this article, I'll first 
tell you how to use FormGen then offer additional information that 
you may find useful. 

You can download the latest version of FormGen here: 


http://www.everypageinc.com/sysadmin 


Overview 


Some databases have only one table, while others may have as 
many as a hundred or more. With FormGen, regardless of the total 
number of tables, you generate your application one table at a time. 
For each table, you create one configuration file, with the name 
<table>.formgen (where <table> is the name of your table — no 
spaces). FormGen then reads the configuration file and generates a 
fully functional Perl CGI script with the ability to browse, edit, add, 
and delete records. ; 

A typical database consists of several related tables, which 
means that the value stored in one 
table is derived from a list of values 
in another table. FormGen handles 
related tables through the use of 
lookups (more on this later). 


Getting Started 


To generate a Web database 
application, do the following: 


# ./formgen <database> <table> 


where <database> is the name of 
the database, and <table> is the 
name of your database table. 
FormGen will look in the current 
directory for a special configura- 
tion file called <table>.formgen. 


24 — Sys Admin 


www.sysadminmag.com 


Generate Lightweight Web Database Applications 
Automatically with Formgen 


Assuming the configuration file is found, FormGen will create two 
new files in the current directory: 


<table>.pl — The Perl CGI script to manage your data. 
<table>.sql — A text file containing the SQL to create the table (if 
necessary). 


All we need to do is create the Forgmen configuration file, which is 
easy. 


Creating a Configuration File 

If you’re eager to get started, simply take a look at the sam- 
ple configuration files on the Sys Admin Web site 
(http: //www.sysadminmag.com), which are commented. If you 
want a more in-depth explanation, please read on. 

A FormGen configuration file ends with the extension “.formgen”. 
One .formgen file creates a script for one table, so if you have mul- 
tiple tables (which is likely), create a separate configuration file for 
each table. The .formgen configuration file is a tab-delimited text 
file. Comments begin with a # character. Each line provides infor- 
mation for a single column in the table. The information is in the 
format: 


Label <tab> Column <tab> Type <tab> Extra Info 
where: 


¢ Label is a nice name for the column (spaces allowed). 

¢ Column is the actual column name (no spaces). 

* Type is one of: hidden, text, textarea, select, lookup (explained 
later). 

¢ Extra Info is dependent on the 
value of Type. 


Example: The Tech Table 


Let’s assume we need a database 
called “workorders” with a table 
called “tech”, which will store each 
tech’s name, email, and a comment 
field. Here’s the configuration file, 
tech.formgen: 


ID <tab> id <tab> hidden 
Name <tab> name <tab> text 
Email <tab> email <tab> text 


That’s it — just three lines to create 
a table with three columns. The first 
column (id) has a type “hidden”, 
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which indicates that we don’t want the user to change this value 
(because it’s created and managed by the database itself). The second 
and third columns are of type “text”, which means they are simple 
entry fields. (We’ll look at more complex types below.) 

If our database doesn’t exist yet, we create it. That means going 
into the MySQL command-line client: 


# mysql -u <username> -p 

The client prompts us for our password. Now we are at the mysql 

prompt. We create the database and grant permissions to a user 

(internal to MySQL, not a Unix user): 

mysql> create database workorders; 

mysql> grant all on workorders.* to ’workorders’ identified by \ 
*workorders’ 

mysql> quit; 

By default, we’ve set the database username and password to the 

same value — obviously something you’ll want to change! 


FormGen generates two files for the “tech” table: 


tech.pl — The Perl CGI script. 
tech.sq] — Which can create the new table. 


Let’s create the new “tech” table: 


# mysql -u <username> -p < tech.sql 


™. 
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Next, let’s move the tech.pl to a directory that is configured for CGI 
scripts. We also have to set the execute bit on the script so that 
Apache will execute it: 


# mv tech.pl] /usr/local/apache/cgi-bin 
# cd /usr/local/apache/cgi-bin 
# chmod +x tech.pl 


And now we can test it. Try adding records to the table. If you get 
“Internal Error’, perhaps you have not allowed the script to execute. 
Alternatively, the first line of the script assumes the location of your 
Per] — if it’s not /usr/bin/perl, edit the first line so that it accurately 
reflects the location of Perl on your system. 


Additional Types 


In addition to the two types listed above, FormGen can create the 
following field types: 


* “textarea” type for long, scrollable comment fields. 
* “select” type for simple dropdown values. 
¢ “lookup” type for dropdown values in a separate table. 


Let’s look at each of these in turn. 

The type “textarea” is simply a scrollable text input field. 
Whereas “text” is stored in the database with up to 255 characters, 
“textarea” is significantly larger. 

The type “select” will produce a drop-down containing a list of 
values (using the HTML select tag). How do you tell FormGen 
which values to use? Simply add another tab and list them, sepa- 
rated by spaces. For example: 


Location <tab> location <tab> select <tab> Springfield Santa_Fe 


Convert any spaces within a value to underscores — the user won’t 
see the underscores. 

The type “lookup” is fairly sophisticated. It’s just like select, 
except that it gets its values from another table in the same data- 
base. So to look up “location” in the “location” table, using that 
table’s “city” column, we would add this line to the configuration 
file: 


Location <tab> location <tab> lookup <tab> location <tab> city 


This presumes the existence of a table named “location”, which 
contains a column “city” (a unique identifier/auto-incrementing key 
named “id” is also assumed). 

While the “lookup” type is somewhat complex compared to the 
“select” type, it allows you to build multiple FormGen scripts that 
can serve as the backbone of a very large application with numerous 
related tables. 


Sample Application 

The Sys Admin Web site contains all the files related to a simple 
workorder application. This application tracks techs, locations, and 
requests. 

The application has three tables: 


tech — Stores the list of techs. 

location — Stores the various locations. 

request — Stores each work request, along with location and the 
tech assigned. 
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A help desk person enters requests as they come in, assigning the 
tech and location. Later, the tech accesses the request, marks the 
request as resolved, and re-submits. 

This trivial application is intended only to demonstrate the ease 
of creating multi-table Web applications with table lookups (request 
looks up the location and the technician). 


Modifying the Code 


If you have some Perl experience, you can modify the generated 
code very easily. If you want to back out of your changes, you can 
always regenerate the code. 

Two modifications most people will want to make are as follows: 


. Database username/password: FormGen connects to the data- 
base via the db_connect subroutine, which contains the database 
username and password, which are both set by default to the 
database name. 

2. If your application has multiple tables, you will probably want to 

change the links at the top of the script to include links to all your 

scripts; just edit the HTML in the header subroutine. 


You can also modify the internal code templates inside FormGen 
itself, and your modifications will be reflected in every script you 
generate. Because FormGen is a Perl script, you simply edit the exe- 
cutable directly in a text editor. If you are really creative, you can 
get FormGen to output code in other languages by creating the 
appropriate templates. (See the FormGen code for the list of tokens 
it uses, which are uppercase strings enclosed in pound signs.) 


Caveats 


FormGen is intended for generating internal applications 
used by a trusted audience. Do not put unmodified FormGen 
scripts on the Internet for the following reason: any Web appli- 
cation that re-displays what the user submits is vulnerable to 
“cross-site scripting” in which specially crafted HTML and 
Javascript entered by users can be used to trick browsers into 
revealing sensitive information. You must strip out such infor- 
mation using regular expressions before re-displaying user-sub- 
mitted data. 

Also, consider modifying the #!/usr/bin/perl -w line at the 
top of the code. For example, consider removing warnings by 
removing the -w, which are written to your error_log file. Also con- 
sider adding taint checking by adding -T, which ensures that the 
script will not do anything unsafe with user-supplied data that has 
not been filtered (“untainted”’) through regular expressions. 


Conclusion 

FormGen provides a quick and easy way to automatically gener- 
ate Web database applications using a simple text configuration file. 
Multi-table applications can be generated using lookups. By editing 
the generated Perl scripts, sophisticated custom functionality can be 
added (but obviously, such changes will not carry over if you re- 
generate the script). Ambitious users can also readily modify 
FormGen’s internal code templates, to change the code that 
FormGen generates. 


Robert J. Bond 3rd serves as Web/database programmer at College of Santa 
Fe in New Mexico, where he is also instructor and advisor in computer sci- 
ence. He thanks David Bond of EveryPage, Inc. for two years of support in 
the development of FormGen. 
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TOOLS 


Making a Dashboard Widget for Systems 


Administration Purposes 
Mihalis Tsoukalos 


ith the release of Mac OS X 10.4 (a.k.a. Tiger), Apple 

\ N / introduced a new feature called Dashboard. Dashboard is 

like a second layer to the desktop that consists of wid- 

gets, which are small, lightweight, task-specific applications. Figure 

1 shows my personal Dashboard setup. Dashboard is activated and 

de-activated and, at the same time, widgets are shown and hidden, 

respectively. In this article, I will describe the construction of a wid- 
get for systems administration purposes. 


Figure 2 shows the contents of the property list file as shown inside the 
Property List Editor, whereas the contents of the HelloSysAdmin.html 
HTML file are shown in Listing 1. The Dashboard Reference Guide 
lists all the allowed property list keys; however, as shown in Table 1, 
only the required keys along with their types and descriptions are 
listed. Also note that, for clarity, you can put the CSS information in 
a separate file and utilize that file by including the following state- 
ments inside the main HTML file of the widget: 


The Structure of a Widget 

Widgets are grouped into directo- 
ries that must have the .wdgt exten- 
sion. Widgets consist of an HTML 
page, JavaScript code for making the 
widget dynamic, Cascading Style 
Sheets (CSS) commands, a widget 
background image in PNG format, 
an icon image in PNG format, and a 
compulsory property list file called 
Info.plist. The icon image is what 
appears in the Dashboard Widget 
bar, and the property list file contains 
required information such as the 
main HTML page for the widget, the 
version of the widget, as well as 
other optional information. 


<style> 
@import "StyleSheetName.css"; 
</style> 


Description of the Sys 
Admin Widget 

The previous example is static, 
but with the assistance of the 
JavaScript language, a widget can 
act dynamically (i.e., automati- 
cally) and periodically change the 
information that it displays. Our 
widget is going to be dynamic, and 
its purpose is to inform us about 
the uptime of our computer, its 
load average as well as the total 
capacity and the free space of each 


CSS is a popular Web standard 
for defining the appearance of HTML pages; it usually comes in a 
separate file but can also be embedded inside an HTML page. The 
use of CSS is not required, but it greatly improves the look of 
HTML pages and therefore is highly desired. Last but not least is 
the JavaScript code, which supports the dynamic behavior of wid- 
gets. A widget can be tested during its development phase using a 
Web browser such as Apple’s Safari provided that it does not use 
some particular features such as the widget. system() call. 


The “Hello Sys Admin magazine” Widget 

Before introducing the sys admin widget, I’ll present a very sim- 
ple one. This is the simplest widget that can be constructed, and it 
just displays “Hello Sys Admin magazine” on the Dashboard. It 
consists of the following components: 


Default.png — The background image used in_ the 
HelloSysAdmin.html page. The file name is mandatory. 
Icon.png — The icon that is shown for each widget in the 


Dashboard dock. 
HelloSysAdmin.html — The main HTML page for the widget. 
Info.plist — The property list file. Please note that the easiest way 
to create a new Info.plist file is with the Property List Editor. 
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of the mounted devices. Its refresh 
rate is 5 seconds, because we do not want to disturb our operating 
system all the time, but this can be changed at will. 


The Construction of the Sys Admin Widget 


The files that are part of the Sys Admin widget are shown in 
Figure 3. Next, I’ll talk about them in more detail: 


SA.html — The main HTML file for the widget. 

SA.css — The file with the stylesheet information for decorating 
our widget. 

Default.png — The background image of our widget (remember: 
this file name 1s mandatory). 

Icon.png — As we have told before this is the image of the widget 
that is shown in the Dashboard dock. The file name is mandatory. 

Info.plist — The property list file of the Sys Admin widget. 

SA.js — The JavaScript code used for the widget. 


It must be said that dynamic widgets need some specific JavaScript 
code and mandatory practices to operate correctly. Now, let us explain 
SA.html file (Listing 2) in more detail. It is a regular HTML file. The 
only obligation that must be met is that an element must be declared 
inside the HTML code for presenting the output information. In our 
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case, the element is called “OurInfo”, and it 
should be specified as read-only. 

In Listing 2, you can also see where in 
the HTML file the JavaScript and the CSS 
code files must be included. The HTML 
file looks very simple because the whole 
logic of the widget has been transferred 
inside the JavaScript code. This abstraction 


improves the readability and the maintain- 
ability of the code. 

I will now explain file SA.js (Listing 3) 
in more detail. Four functions, getData(), 
UpdateWidget(), show(), and hide() 
are declared inside the file. Function 
UpdateWidget() is called from the 
SA.html file upon the complete loading 


of the HTML file. The very important 
code line “if (window.widget)” checks 
whether Dashboard is active and defines 
the action that must be done when it is 
active or when it is hidden (e.g., the wid- 
get does not consume CPU time when 
Dashboard is hidden). The code line “if 
(window.widget)” actually assigns methods 


Listing 1 The HTML contents of a simple widget 
e- 
Mihalis Tsoukalos 


File: HelloSysAdmin. html 
Date: Wednesday 17 August 2005 


This file is provided without any warranties 


<html> 
<head> 


<style> 
body 


margin: 0; 


«hel loSA 


font: 20px "Lucida Grande"; 
font-weight: bold; 
color: white; 
text-align: center 
position: absolute; 
top: 25px; 
left: 18px; 
width: 180px; 
} 
</style> 


</head> 
<body> 

<img src="Default.png"> 

<div class="helloSA">Hello Sys Admin magazine!</div> 
</body> 


</html> 


Ea eee ee 
Listing 2 The HTML code of Sys Admin Widget 


<ls< 
Creator: Mihalis Tsoukalos 


File: SA. html 
Date: Thursday 18 August 2005 


This file is provided without any warranties. 
This file is provided for demonstration purposes. 


“> 


<html> 
<head> 


<script language="text/javascript" src="SA.js" /> 
<style type="text/css"> 

@import "SA.css"; 
</style> 


</head> 


<body onLoad="UpdateWidget();"> 
<img src="Default.png"> 


<div class="SAtitle">Sys Admin magazine widget 
</div> 


<div class="SAtext"> 
<intput class="SAtext" type="text" id="OurInfo" readonly></textarea> 
</div> 

</body> 


</html> 


Listing 3 The JavaScript code of Sys Admin Widget 


// Filename: SA.js 
// The davaSscipt code for the Sys Admin Widget 
// 
// Date: Saturday 27 August 2005 
// Creator: Mihalis Tsoukalos 

JHE 
// This file is provided without any warranties. 

// This file is provided for demonstration purposes. 


var timerInterval = null; 
function getData() 


var OUTPUT_DATA = ""; 


// The UNIX uptime command contains both uptime and load average info. 


var uptime = widget.system("/usr/bin/uptime",null).outputString; 
var load_average = uptime; 


// uptime data 

QUTPUT_DATA += "<u>Uptime</u>: "; 
pre=uptime.split("up"); 
uptime=pre(1]; 
sec=uptime.split(","); 

uptime=''; 

for(y=0; y<sec.length-1; yt++) 
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{ 
if(sec[y].match("day") || sec{y].match("mins") || secly].match(":")|| \ 
sec{y].match("hrs")|| secCy].match("hour") ) 
uptimet=secLy]; 
} 
} 


QUTPUT_DATA += uptime; 


/ load average data 

TPUT_DATA += "<br><u>Load average</u>: "; 
re = load_average.split("averages: "); 
oad_average = pre[1]; 

TPUT_DATA += load_average; 


orv om 


/ Hard Disk Usage data 
ar df = widget.system("/bin/df -1 -m",null).outputString; 
ar = df.split(/(\n)/); 


=m, 


f 
v 
P 
d 
QUTPUT_DATA += "<br><u>Disk information:</u>"; 

// Staring the df table. 

QUTPUT_DATA += \ 

"<table bgcolor="#4F4F4F' border cellspacing=0 cellpadding=2>"; 


// The table header column. 
QUTPUT_DATA += "<tr>"; 
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to “widget.onshow” and “widget.onhide” 
properties. 

Function show() sets the refresh rate 
interval for function UpdateWidget(), which 
is 5000 milliseconds or 5 seconds. Function 
hide(), which is relative to the previously 


Table 1 Required property list keys 


| CFBundleName | 


CFBundleDisplayName 
CFBundleldentifier 


String | Apple’s 


The name of the bundle 
The name of the bundle that is displayed 


com.apple.widget.<widgetname> 
Other widgets are named similarly. 


| CFBundleVersion 
| MainHTML 


Figure 1 Dashboard example 


ere TS EADARTEERESTE TLE ESET 


‘upon’ + lustrare ‘illuminate,’ 


as shown in Property List Editor 


String | The version information of the widget 
String | The relative path to the widget’s main HTML file 


ORIGIN carly Loth contin the sense filhunins 
from Latin Mlustrat- it up, from the verb ilustrare, trom ine 


mentioned property “widget.onhide”, stops 
updating the widget when Dashboard is hid- 
den. The getData() JavaScript function does 
all the crucial work. 

The HTML output code is stored into 
variable OUTPUT_DATA. It must be told 


own widgets are named 


TUE WED ‘FRI SAT SUN 


aae2920? 
34° 34° 32° 31° 32° 32° 


. Shed light on} 


Me Figure 2 The contents of Info.plist for the “Hello Sys Admin magazine” widget 
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that the widget.system() function is only available to Dashboard 
widgets and cannot be used and tested in a Web browser such as 
Safari. It is a very powerful function because it allows the widget to 
execute external commands. Imagine that if you did not want to use 
JavaScript, you could have used an external Perl script that could 
produce all the HTML output! 

Finally, document.getElementById("OurInfo").innerHTML = 
OUTPUT_DATA; outputs the contents of variable OUTPUT_DATA 


into an HTML element called “OurInfo”. You must declare this ele- 
ment into the HTML code of the widget or else you will get no out- 
put. As you can see, SA.js uses the widget.system() call, and in 
order to use the widget.system call, the Info.plist file must contain 
an entry named AllowSystem with the Boolean value of “yes”. 
Listing 4 shows the small CSS code of our widget. For more 
extended information about CSS, refer to Cascading Style Sheets: 
The Definitive Guide book from O’ Reilly. Graphic files Default.png 


Listing 3 continued 


QUTPUT_DATA += \ 

"<td style='font-size:9px;color:lightgray;'>Mount Point</td>"; 
QUTPUT_DATA += "<td style='font-size:9px;color:lightgray; >Size</td>"; 
QUTPUT_DATA += \ 

"<td style='font-size:9px;color:lightgray; '>Occupied</td>"; 
QUTPUT_DATA += "<td style="font-size:9px;color:lightgray;'>Free</td>"; 
QUTPUT_DATA += "<td style='font-size:9px;color:lightgray;'>Used</td>"; 
QUTPUT_DATA += "</tr>"; 


for(y=l; y<par.length-1; y++) 
{ 
df += "<tr>"; 


// mount point 

etails = par(y].split("%"); 

df += "<td nowrap style='font-size:9px;align:center;'>"; 
df += details[1] + "</td>"; 


details = parly].split(/\st+/); 


// size 
details[1] = parseFloat(details[1})/1024; 
df += "<td nowrap style='font-size:l0px;align:center;'>"; 
df += details[1].toFixed(2) + "Gb</td>"; 


// Occupied 
details[2] = parseFloat(details[2])/1024; 

df += "<td nowrap style='font-size:l0px;align:center;'>"; 
df += details(2].toFixed(2) + "Gb</td>"; 


Listing 4 The CSS code of Sys Admin Widget 


Ral 

Creator: Mihalis Tsoukalos 
File: SA.css 

Date: Thursday 18 August 2005 


his file is provided without any warranties. 


ey 
body 

margin: 0; 
-SAtitle 


font: 12px "Lucida Grande"; 


// free 

details[3] = parseFloat(details(3])/1024; 

df += "<td nowrap style='font-size:10px;align:center;'>"; 
df += details[3].toFixed(2) + "Gb</td>"; 


// used 
df += "<td nowrap style='font-size:l0px;align:center;'>"; 
df += details[4] + "</td>"; 


d 


} 
OUTPUT_DATA += df; 


of 


= "</tr>"; 


// Finishing the df table. 
QUTPUT_DATA += "</table>"; 


document.getElementById("OurInfo").innerHTML = OUTPUT_DATA; 


} 
function UpdateWidget() 
{ 
if (window.widget) 
{ 
widget.onshow = show; 
widget.onhide = hide; 
} 
show(); 
} 
function show() 
{ 
getData(); 
if (timerInterval == null) 
i! 
timerInterval = setInterval("UpdateWidget();", 5000) 
} 
} 


function hide() 
{ 
if (timerInterval != null) 
{ 
clearInterval(timerInterval); 
timerInterval = null; 


Listing 5 The Info.plist contents of the Sys Admin Widget in 
XML format 


<?xml version="1.0" encoding="UTF-8"?> 


<!DOCTYPE plist P 
“http://www. app 


BLIC "-//Apple Computer//DTD PLIST 1.0//EN" \ 
e.com/DTDs/PropertyList-1.0.dtd"> 


font-weight: bold; <plist version="1.0"> 

color: black; <dict> 

text-align: center; <key>Al lowSystem</key> 

position: absolute; <true/> 

top: 25px; <key>CFBundleDisplayName</key> 
left: 44px; <string>Sys Admin Widget</string> 


width: 180px; <key>CFBundleldentifier</key> 
text-decoration: underline; <string>personal .widget.SysAdmin</string> 
} <key>CFBund] eName</key> 
<string>Sys Admin Widget</string> 
.SAtext <key>CFBundleVersion</key> 
{ <string>1.0</string> 
font: 10px “Lucida Grande"; <key>CloseBoxInsetX</key> 
font-weight: bold; <integer>16</integer> 
color: yellow; <key>CloseBoxInsetY</key> 
position: absolute; <integer>14</integer> 


top: 25px; <key>Ma i nHTML</key> 
left: 15px; <string>SA. html </string> 
width: 180px; </dict> 

} </plist> 
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and Icon.png have been made with Adobe Photoshop CS2 for Mac. 
Note that Icon.png file has to be 82x82 pixels. Refer to the Dashboard 
Reference for the full requirements for those two image files. 
Finally, Listing 5 shows the contents of Info.plist file. This is actu- 
ally an XML text file, so you can also create it with a simple text 
editor, but I think it is much easier if you use the Property List Editor. 


The Installation of a Widget 

It is easy to install a widget — just double-click it and a message 
asking whether you really want to install it will appear on screen. In 
general, Dashboard provides two places for a user to install widgets: 
the first one is directory ~/Library/Widgets, which is inside the user 
home directory, and the other is /Library/Widgets, which is in the root 
directory of the Mac OS X 10.4 boot hard disk. If you want a widget 
to be available only to you, you can install it in the location inside 
your home directory; otherwise, put it in the system-wide location. 
You can uninstall a widget by simply deleting it (i-e., moving it to the 
Trash) from its installation directory or by using the Widget Manager. 

After successfully installing the Sys Admin widget, you will get 
a picture similar to Figure 4. The Widget Manager, which was 
added with the release of Tiger 10.4.2 update, improves the control 
and the security we have over widgets. 


Conclusions 


Apple wisely decided to include widgets in Tiger. Existing wid- 
gets serve us well, but we can also construct new widgets to meet 
our specific needs. The making of a widget is not that difficult; it 
mainly involves some HTML, JavaScript, and CSS knowledge, as 
well as a lot of imagination. 


Figure 3 The files of Sys Admin widget 
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Taming Nagios 
David Josephsen 


n the past few years, Nagios has become the industry standard 

open source systems monitoring tool. If you’re using an open 

source app to monitor the availability, state, or utilization of 
your servers or network gear, then chances are you are using Nagios 
to do it. To those who have worked with it, this is no surprise. The 
lightweight design of Nagios offloads the actual query logic into 
“plug-ins”, which are easily created, modified, and re-purposed by 
sys admins. The lack of complex query logic leaves the Nagios dae- 
mon free to manage scheduling and notifications and to handle UI. 
Nagios’s “keep it simple” approach makes it straightforward to 
administer, network transparent, and amazingly flexible. 

Two excellent articles by Syed Ali in previous editions of Sys 
Admin covered the installation and configuration of Nagios. In this 
article, I’ll pick up where those articles left off and provide some cre- 
ative solutions to problems commonly faced by sys admins working 
with Nagios to monitor the health and performance of systems. 


Wrapping Around Plug-Ins 

In my experience at work and in the forums, I’ve noticed that sys 
admins dealing with Nagios for the first time invariably ask two ques- 
tions. The first is “Why can’t I get any performance data?” The 
Nagios daemon has hooks for exporting performance data that it 
receives from the plug-ins to exter- 
nal programs. These hooks are usu- 
ally used to provide data to graphing 


to something like this: 
printf("%s|%s", output, output) ; 


and recompiling the plug-in. 

But we’re busy people, and we can do better than that. Listing | 
is a “generic plug-in wrapper”, written in sh, which will add perfor- 
mance data support to any plug-in. It works by proxying the “real” 
plug-in and capturing its output and exit code. Then, it simply 
echoes the captured output twice and exits with the captured exit 
code. To use it, copy it to the server’s plug-ins directory (usually 
/usr/local/nagios/libexec) as ““check_wrapper_generic”. Then, to pro- 
vide performance data for your “check_mem”’ plug-in, you would do: 


cd /usr/local/nagios/libexec 
jn -s check_wrapper_generic check_mem_wrapper 


Then, replace all instances of check_mem with check_mem_wrapper in 
your checkcommands.cfg file, and you’re done. You now have a com- 
mon, system-wide interface for selective support of performance data. 

The second frequently asked question is “How do I create a service 
that checks more than one port?” Nagios comes with check_tcp and 


-, check_udp plug-ins that can be con- 
| figured to check any single tcp or 
udp port. The problem is that some 


programs like RRDTool or MRTG. 
The problem lies in that the plug-ins 
themselves must provide the perfor- 


applications span multiple ports, and 
it feels kludgey to sys admins to con- 
figure multiple logical services for a 


mance data for this to work, and the 
preponderance of plug-ins do not. 
Despite a very straightforward inter- 
face design, it seems that perfor- 
mance data remains an afterthought 
in the fast-paced world of Nagios 
plug-in development. 

Thankfully, the missing support 
for performance data is not difficult 
for blue collar sys admins to add. 
The Nagios daemon considers any- 
thing after a pipe character in the 
plug-in’s normal output to be per- 
formance data and exports it accordingly. 

For example, if the plug-in’s output is “cpu:15%|15”, then 
Nagios will place “cpu:15%” in the output pane of the Web GUI 
and export “15” to whatever applications are configured to receive 
performance data. So, on the forums, the common answer to this 
question is to hack the source of the plug-in in question, and echo its 
output twice. Since most plug-ins are written in C, this usually 
means changing this: 


printf("%s", output); 
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single physical entity. It’s not just 
aesthetics, a junior admin or man- 
ager would easily be confused by 
three different “Oracle” services in 
the Nagios UI. Does red in one mean 


that the Oracle listener is down? 
Listing 2 is another plug-in 

“wrapper”, written in bash (or any 

shell with getopts support), which 
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check_(tcp|udp) plug-in to easily 
provide for an arbitrary number of 
space-separated port numbers in a 
single service definition. Called with -u, it wraps around check_udp. 
Called with -t, it wraps around check_tcp. This way you can have a 
single service instantiation that checks multiple ports. Other than 
multiple port numbers, and a switch for tcp or udp, its syntax is very 
similar to the normal check_tcp plug-in. For example, the following: 


| will wrap around the existing 
} 
| 


check_multi_tcp -H serverl -t -p "21 22" 


would check ports 21 and 22 on server]. Call it with -h for a technical 
description of its syntax. 
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In Search of Humanity 


Tools like NACE, which create config files based on automated 
network discovery, ease the headache of managing the config files, 
but some configuration will always be manual. Automatic discovery 
of systems is one thing, automatic discovery of human contacts and 
which services they are interested in receiving notifications for is 
quite another. Or is it? 

Tying Nagios to listmanager software, like MajorDomo or 
EZMLM, gives users the capability to simply tell us what they’re 
interested in by “subscribing” to the Nagios services from which 
they want notifications. We configure Nagios to know about the lists 
and mail them instead of the contacts directly, thereby ceasing our 
contact configuration conundrums. 

Getting this to work is a three-step process. Listing 3 is a shell 
script that, given a list of services in the form of “server service” on 
stdin, will create an EZMLM list for each one. There is a filesystem 
interface to the current Nagios system state in your Nagios var 
directory. Usually this is “/usr/local/nagios/var/status”. So you can 
generate the input for Listing 3 with: 


find /usr/local/nagios/var/status -type f | sed \ 
@ 'S/.*\/\(LAV/IVANV/AC EAN IMANDS/AL 27! 


EZMLM lists consist of a lot of files, but some of these files are 
universal; they will be the same for every list. So, Listing 3 uses a 
template. Every list hard-links to the files in the template, thus saving 
four inodes per service. These “universal” files are headerremove, 
inhost, lock, outhost, and public. Check the EZMLM docs for the 
proper configuration of these files and place them in the location 
expected by the script. Inhost and outhost need the FQDN of your 
Nagios box, and headerremove probably needs to contain at least 
“return-path” and “return-receipt-to”, depending on your network 
specifics. 

Once the lists are created, the second step is making Nagios use 
them. To begin, add a contact for a generic forwarder: 


define contact{ 


use default-template 
contact_name listforwarder 
alias Ezmlm relay script 


service_notification_commands service-relay-to-list 
host_notification_commands host-relay-to-list 
email nothing@nowhere.com 


Then add that contact to every configured “contactgroup”. 
Finally, add the service-relay-to-list and host-relay-to-list com- 
mands in your misccommands.cfg: 


# for forwarding to EZMLM Lists 
define command{ 
command_name host-relay-to-list 
command_line /usr/bin/printf "%b" “Host '$HOSTALIAS$' \ 
is $HOSTSTATE$\nInfo: $HOSTOUTPUT$\nTime: \ 
$DATETIMES\n\n\nUser:$CONTACTNAME$" | \ 
/usr/local/nagios/sbin/listforwarder.sh \ 
"$HOSTNAME$S" "host" "'echo \ 
$NOTIFICATIONTYPE$ | /usr/bin/cut \ 
-c '1-3' ': $HOSTNAMES $HOSTSTATE$" \ 
2>&1 | logger -p mail.info 
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# for forwarding to EZMLM Lists 
define command{ 

command_name service-relay-to-list 

command_line /usr/bin/printf “%b" "Service: \ 
$SERVICEDESC$\nHost: \ 
SHOSTNAME$\n$HOSTALIAS$\nAddress: \ 
$HOSTADDRESS$\nState: $SERVICESTATE$\nInfo: \ 
$SERVICEOUTPUT$\nDate: \ 
$DATETIMES$\n\n\nUser:$CONTACTNAMES" | \ 
/usr/local/nagios/sbin/listforwarder.sh \ 
"SHOSTNAME$" "$SERVICEDESC$" "'echo \ 
$NOTIFICATIONTYPE$ | /usr/bin/cut -c \ 
"1-3'': $HOSTNAMES$/$SERVICEDESC$ \ 
$SERVICESTATES" 2>&1 | logger -p mail.info 


Listing 4 is the listforwarder.sh script referred to in the com- 
mands above. With this in place, people interested in getting noti- 
fications from the “CPU” service on “server1” can send an email 
to “server |1-CPU-subscribe @ your.nagios.box”. The built-in con- 
tacts interface is not affected by this arrangement, so you can still 
manually maintain groups of contacts within Nagios where it 
makes sense to do so. 

Now that all the lists have been created and Nagios is configured 
to make use of them, the third and final step is to ensure new lists 
are created when new services and hosts are added going forward. 


Failsafe Changes 


Changing the configs is dangerous busi- 
ness, because errors in the config files bring 
down the running Nagios daemon, and it 
will stay down for as long as it takes you to 
correct the bugs. Arrangements like the 
mailing list tip above further complicate 
things. Admins simply won’t remember to 
create new lists for new hosts and services. 
A simple shell script could satisfy that 
requirement by checking for changes to the 
configs and creating new mailing lists 
accordingly. In practice, however, it’s not 
so easy. You need to either run the script 
from cron, risking a race condition if user 
requests beat the cron job, or proceduralize 
changes to the production Nagios daemon 
to make sure nobody forgets to run the 
script. So while we’re creating procedures, 
it would be nice if we could also provide 
for some rollback functionality and error 
checking. 

We do this with the help of CVS. Our 
production Nagios configs are checked into 
a CVS repository, and that repository is 
checked out in /etc/ on our production 
server. This enables admins to work on the 
files offline, ensures that they don’t step on 
each other by directly editing the live con- 
figs in production, and provides a revision 
history they can fall back on if there are 
problems. Once their changes are made and 
committed, our admins use the shell script 
in Listing 5 on the production server to actu- 
ally change the running Nagios daemon. 
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They’re provided sudo access to this script, so that they can’t do it 
any other way. 

This shell script gets their changes from CVS, checks them to 
make sure there are no errors (with nagios -v), safely HUPs the 
running Nagios daemon, and then creates any necessary EZMLM 
lists by providing input to the script in Listing 4. If bugs are found in 
the configs, it dies without stopping Nagios. If unauthorized changes 
are made, they are overwritten. Providing all of this in a single com- 
mand made the process so easy and transparent for our admins that 
many of them assumed Listing 5 was a program provided by the 
Nagios installation tarball and wondered what had happened to it 
when they moved to other Nagios implementations. Indeed, once 
you have a mechanism for failsafe changes, it seems obvious for one 
to be built in. 


Getting What They Want 


“T’d like notification A to be sent to my pager, during the day, 
and to my email at night, and notification B sent to my email, but 
only on weekends, and never on the 15th of the month, unless it’s 
the second Wednesday in May.” Sound familiar? Don’t be upset, 
this is actually a good thing. You are experiencing the symptoms 
of a monitoring system that actually works. Complex configura- 
tions like this are at least possible with Nagios, if not easy. You'll 
need multiple instantiations of the same service for different con- 
tacts at different time periods. 

We’ve had some luck outsourcing some of this conditional logic 
to special mail aliases. The condition for the email to be sent at 
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night, for example, could be offloaded to a Qmail alias called 
“qmail-nighttime-default” containing: 


|condredirect $EXT2 /usr/local/bin/nightchecker 


Figure 1 A sample “business logic’, or “stop-light’” interface using check_cluster2 


as it’s back end. 
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Figure 2 Click through the business logic interface to this NagVis diagram. This 
makes it easy for management to see the impact of the problem. 


Elle Edit View Go Bookmarks Tools Help 


ae 


@-s~-2 © [Di mtozinagios: 


Mail Architecture 
Wi Stuff 

Things 

® More Stuff 

@ Eww, Pink 


(Deprecated) 


———_", 


38 — Sys Admin 


www.sysadminmag.com 


Nightchecker is a simple shell script that makes sure it’s currently 
“night time” (as defined by our SLA) and exits accordingly: 


#!/bin/sh 
myhour="date '+%k"' 
if [ $myhour -1t '7' J 
then 
if [ $myhour -ge '0' J 
then 
exit 0 
fi 
elif [ $myhour -gt '19' ] 
then 
exit 0 
else 
exit 99 
fi 


So, with this in place, you can have a single 
service instantiation that emails user- 
nighttime @ your.nagios.box, and the mail 
will only make it if it’s currently night 
time. The nice thing about these aliases is 
that you can stack them. So, given another 
called .qmail-weekends and yet another 
called .qmail-secondWedInMay, your service 
could email: user-nighttime-weekends- 
secondWedInMay @ your.nagios.box. It won’t 
replace timeperiods.cfg, but it has helped 
us out of some tight spots. 


Acknowledged 


Nagios allows alert recipients to “acknowl- 
edge” problems, thereby stopping the recur- 
ring problem notifications while optionally 
providing a helpful comment to your fellow 
admins. I find that managers love the thought 
of this feature but find it difficult to implement 
in practice. It’s just hard to take the time to log 
into a Web interface to acknowledge a prob- 
lem while a production system is down. 

We thought of a quicker way. Listing 6 
is a shell script that enables alert recipients 
to acknowledge problems by replying to 
the notification email they receive. It 
works by way of the Nagios command file, 
so be careful with this one; it puts data 
controlled by unknown agents into your 
command fifo. 

The script is designed to be given the 
email on stdin. It checks the message to 
make sure it’s not a bounce and parses vari- 
ous pieces of information out of it, such as 
the hostname, service name, and an optional 
comment from the person who ack’d. You’ ll 
need to make sure that the return path of 
notifications is a routable address that 
dumps the reply into this script. You’ll also 
want to add the “CONTACTNAME?” macro 
to your notify commands in misccom- 
mands.cfg, prefixed with “User:”. Here’s 
what ours looks like: 
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define command{ 
command_name 
command_line 


host-notify-by-muttpager-short 
/usr/bin/printf "%b" "Host '$HOSTALIAS$' \ 
is $HOSTSTATE$\nInfo: $HOSTOUTPUT$\nTime: \ 
SDATETIMES$\n\n\nUser:$CONTACTNAME$” | \ 
/usr/bin/mutt -F /etc/nagios/muttrc.cfg \ 
-e ‘set realname=""" -s "'echo \ 
$NOTIFICATIONTYPE$ | /usr/bin/cut -c \ 
"1-3'': $HOSTNAMES $HOSTSTATE$" \ 
$CONTACTPAGER$ 2>&1 | logger -p mail.info 


} 


Capturing Business Logic 
Management, for the most part, is interested in system health in 
the context of business processes. They don’t care that the smtp 
process on Mail4 is down; they care about whether email is cur- 
rently working in general. Red light or green light — is “email” 
working right now? Capturing business logic is, in my opinion, the 
holy grail of network and systems monitoring, and the big com- 
mercial monitoring apps (again, in my opinion) do a woefully 
inadequate job at it. Nagios doesn’t quite have the built-in hooks 
to capture the business processes around its host and service defini- 
tions either, but there are some things that can help you get close. 

The concept of “email” as a business process is, in reality, a 
myriad of complex interactions between systems. To answer the 
“red light or green light” question, you need to aggregate the 
status of many services on numerous hosts into a singularly 
instantiated entity. 

Check_cluster2, in the contrib direc- 
tory can do this for you to some degree. 
Check_cluster2 was written to report the 
overall status of a cluster by checking the 
status information of each individual 
host or service cluster element. It works 
by simply taking a definition of the 
“cluster” as a list of services and hosts, 
and exiting 0 if they’re all up. It checks 
their status using the SERVICESTATEID 
and HOSTSTATEID macros. It’s not a huge 
cognitive leap to simply consider “email” as 
a cluster of services and write the definition 
accordingly. 

That gets you 90% of the way there, 
but there are two big limitations to 
check_cluster2. The first is that you can- 
not define conditional logic. Either all the 
cluster elements are up, or they aren’t, and 
this doesn’t reflect the reality of business 
processes very well. We’d like to see a 
version with definition syntax similar to 
lisp conditionals, or ldap search syntax. 
Something like this: 


(|| Chost1-smtp)(host2-smtp) ) 


would reflect that either the smtp service on 
host! or host2 could be up and “email” 
would still be up. The second limitation is 
that since check_cluster2 uses internal 
Nagios macros, it must be used from within 
Nagios and won’t allow itself to be scripted 
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externally. I’d like to see a version that used the filesystem interface 
instead. 

In reality, “email” can’t be represented by a red or green light. Its 
health can only be measured by degree. But that won’t stop man- 
agement from asking you to give them the stop light. So, when you 
turn it red, you had better be able to explain why. Nagvis, a PHP 
program from some fellows in Germany, can help you answer that 
question. 

Nag Vis allows you to easily animate Visio-style diagrams with 
real-time information from Nagios; it can give you graphical “click- 
through” explanations of your aggregation decisions. Using motion 
gifs, “lights” that correspond to hosts and services can be made to 
blink on and off. It’s like catnip for managers, but at the price of yet 
another config file. 

Combining check_cluster2 and Nagvis gets us some rudimen- 
tary management views. For example, Figure | is an example of a 
simple “Business Logic” interface. Several important business 
functions are listed, along with their status (red, yellow, or green). 
If someone wanted to know what “Corporate Email” consisted of, 
they could click through it to the Nagvis diagram depicted in 
Figure 2. 

Each green dot on the Nagvis diagram is configured to point to 
what we call a “rollup” service. It’s actually a check_cluster2 ser- 
vice configured to watch every “email-related” service on its parent 
server. If any of these services goes down, so does the rollup ser- 
vice, and the Nagvis diagram responds with a flashing red dot like 
the one next to the “deprecated” server in Figure 2. 
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Listing 1 This wrapper will add performance data output 
to Nagios plug-ins that don't already have it. 


#!/bin/sh 

#a wrapper which adds perfdata functionality to any nagios plugin 
#link pluginName_wrapper to this script for it to work 

#for example, if you want to enable perfdata for check_mem 

#tyou would 'In -s check_wrapper_generic check_mem_wrapper' 


#get rid of the ‘wrapper’ on the end of the name 
NAME="echo $0 | sed -e 's/_wrapper//'~ 

#tcall the plugin and capture it's output 
OUTPUT="${NAME} $@° 

#capture it's return code too 

CODE=$? 


#parrot the plugin's output back to stdio twice, seperated with a pipe 
echo "${OUTPUT}|${OUTPUT}" 

#exit with the same code that plugin would have exited with 

exit ${CODE} 


Listing 2 This script provides the ability to create single 
services that check more than one port by wrapping 
around check_tcp, and check_udp. 


#!/bin/bash 
#tcall check_tcp once for each port. aggregate the result 
#dave josephsen 


HOME='/usr/local/nagios/libexec’ 
PROTO="Nul1z0r' 


printusage () 

{ 

echo "this plugin calls check_tcp once for each port" 

echo "usage:" 

echo "check_multi_tcp -H host -u|-t -p \"port [port] ...\"" 

echo "-h : print this message" 

echo "-H hostname: The hostname of the box you want to query \ 
(default localhost)" 

echo "-p port number: A space seperated list of port numbers" 

echo "-t wrap around check_tcp" 

echo "-u wrap around check_udp" 

exit ${EXITPROB} 

} 


while getopts ":hH:utp:" opt 
do 


case $opt in 
h ) printusage;; 
H ) HOST=$ {OPTARG} ; ; 
p) PORT=$ {OPTARG}; ; 
u) PROTO='udp';; 
t) PROTO='tcp';; 
2) 


printusage;; 
esac 
done 


if echo "${PROTO}" | fgrep -q 'NullzOr' 
then 
echo "ERROR: either -u or -t required” 
echo 
printusage 
fi 


for i in “echo ${PORT}* 
do 
${HOME}/check_${PROTO} -H ${HOST} -p ${i}>/dev/null 
if [ "$2" -ne 0] 
then 
echo “port ${PROTO}/$i is not open" 
exit 2 


echo "all ports are open" 
exit 0 
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Conclusion 


Nagios is a great tool for monitoring systems and networks. I 
hope I’ve given you some ideas that will help you expand its use, 
manage its complexity, and most of all, make your life easier. 


References 


Check_cluster2 plug-in — 
http://nagios.sourceforge.net/docs/2_0/clusters. html 

CVS — http://www.nongnu.org/cvs/ 

EZMLM — http://www.ezmim.org/ 

Majordomo — http://www.greatcircle.com/majordomo/ 

MRTG — http://people.ee.ethz.ch/~oetiker/webtools/mrtg/ 

Mutt — http://www.mutt.org/ 

NACE — http://www.adamsinfoserv.com/AISTWiki/bin/ \ 
view/AIS/NACE 

Nagvis — http://www.nagvis.org/ 


Listing 3 This script creates an EZMLM mailing list for a 
given service on a given host. Users can then “subscribe” 
to this service, if they are interested in knowing it's status. 


#!/bin/sh 
#given list stuff on stdin, create list stuff 


HOME='/usr/local/nagios/lists' 
SERVER_ADDR="your.nagios.fqdn' 
TEMPLATE="$ {HOME} /LIST_TEMPLATE" 


while read i 
do 
#firgure out what the servername and service names are 
SERVER="echo $i | cut -d\ -fl° 
SERVICE="echo $i | sed 's/\(*E* J\t\) \C.*\)/\2/" [tr \ 
YEsuppen sl “Cslower" |) th & * ta" 


echo 
echo "creating ${SERVER}-${SERVICE}" 


#create the list 
ezmlm-make ${HOME}/${SERVER}-${SERVICE} \ 
/var/qmail/alias/.qmail-${SERVER}-${SERVICE}\ 
${SERVER}-${SERVICE} ${SERVER_ADDR} 


#link in the template files 
for j in “ls ${TEMPLATE}> 
do 
rm ${HOME}/$ {SERVER} -${SERVICE}/$j 

In $TEMPLATE/$j ${HOME}/${SERVER}-${SERVICE}/$j 
done 
f#fadd the replyto header 
echo "Reply-To: \ 

<${SERVER_ADDR}>" >> ${HOME}/$ {SERVER} -${SERVICE}/headeradd 


#fix the perms 
chown -R alias.qmail ${HOME}/${SERVER}-${SERVICE}/ 
chmod -R gtrwX ${HOME}/${SERVER}-${SERVICE}/ 

done 


#fix the alias perms 
chown -R alias.qmail /var/qmail/alias 


Listing 4 Listforwarder.sh is the glue between Nagios 
and EZMLM. 


#!/bin/sh 

# Takes three args 

#1 is the hostname 

#2 is the service name ( or "host" if it's a host alert ) 

#3 is the subjectline of the mail we want to send (message body \ 
comes on stdin) 


SERVER="echo ${1} | tr '[:upper:]' '[:lower:]' 
SERVICE="echo ${2} | tr ‘[:upper:]' ‘[:lower:]' | tr ' ' ‘_' 


cat | /usr/bin/mail -s "${3}" "${SERVER}-${SERVICE}@your.nagios.box" 
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Listing 5 Failsafe updates to the Nagios Config Files are 
provided by this script, which gets the changes from CVS, 
checks for errors, and creates new mailing lists if necessary. 


#!/bin/sh 

# Get changes to the configs 

# Check them for errors 

# HUP Nagios 

# Create mailing lists as required. 


HOME='/usr/local/nagios' 

BIN="${HOME}/bin" 

CFG='/etc/nagios' 

STATUS="$ {HOME}/var/status" 

LISTHOME="/usr/local/nagios/lists' 
LISTMAKER="${BIN}/listmaker.sh" # <--the script from Listing 4 


{HHHHHHHHHHE Get changes to the configs 
echo “refreshing the conf files from CVS" 
cd /etc/nagios/ && cvs update -dAC 

chown nagios.nagios /etc/nagios/* 


JHAHHHHHHHHHHE Check them for errors 


echo "checking for configuration errors” 


ERRORS="${BIN}/nagios \ 
-v ${CFG}/nagios.cfg | grep ‘Total Errors’ | awk '{print $3}"~ 

if [ -z $ERRORS ] || [ “$ERRORS" -gt 0 ] 
then 

echo "ERRORS DETECTED" 

${BIN}/nagios -v ${CFG}/nagios.cfg 

exit 1 
fi 


echo "no errors found =-] " 
{HHHHHAHHHAHHHHE HUP Nagios 
echo "stopping nagios" 
/etc/init.d/nagios stop 


echo "waiting for children to die..." 


sleep 

CHILDREN="ps -A | grep nagios | we -1° 
COUNTER=0 

while [ ${CHILDREN} -gt 0 ] 

do 


echo "still ${CHILDREN} children alive, waiting..." 
sleep 3 
CHILDREN="ps -A | grep nagios | we -1° 
COUNTER=$( ($COUNTER+1) ) 
if [ "$COUNTER" -gt 4 ] 
then 


echo "Killing impolitely, Weve been waiting too long" 
killall -9 nagios 


fi 
done 


echo "children dead, restarting parent" 
/etc/init.d/nagios start 


{HEHBHHHHAHHHHHHHHE Create mailing lists as required. 
echo -n "Did you add any new hosts or commands? (y/n) > " 
read 


if echo ${REPLY} | egrep -q ‘*L[yY]' 
then 
find ${STATUS} -type f | sed -e \ 
"S/V/\(LV/IVFN)A/VCLAVZAFADS/AL N27" | while read i 


do 
HOST="echo $i | cut -d\ -fl | tr '[:upper:]' '[:lower:]' 
SERVICE="echo $i | cut -d\  -f ‘2-' | tr ‘[:upper:]' \ 
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Mslowens|..i[ treo" 


if [ "${HOST}" != "${OLDHOST}" ] #must be a new host 
then 

if [ | -e "${LISTHOME}/${HOST}-host" 1 

then 


echo "${HOST} host" | $LISTMAKER 
i 
OLDHOST=$ {HOST} 


fi 
if [ ! -e “${LISTHOME}/${HOST}-${SERVICE}" ] 
then 

echo ${HOST} ${SERVICE} | $LISTMAKER 


Listing 6 This script implements epager acknowledgments. 
With this in place, techs can reply to pages from Nagios to 
acknowledge service alerts. 


#!/bin/sh 
#a script to parse nagios acknowledgements from epagers. 


CMD='/var/nagios/rw/nagios.cmd' 
TMP="/tmp' 
TFILE="/bin/tempfile -d ${TMP}* 


while read i 
do 

echo $i >> ${TFILE} 
done 


if egrep -i '*Return-Path: <>' ${TFILE} || egrep -i \ 
"from: .*MAILER.*DAEMON' ${TFILE} 
then 
echo "Is a bounce" 
exit 0 
fi 


MESSAGE=egrep -i ‘*ack:' ${TFILE} | cut -d: -f2° 

SERVICE="egrep ‘Service: ' ${TFILE} | cut -d: -f2| sed -e 's/* //'* 
HOST="egrep ‘Host: ' ${TFILE} | cut -d: -f2| sed -e ‘s/* //' 
FROM=egrep ‘*From: ' ${TFILE} | cut -d: -f2| sed -e 's/* //** 
USER="egrep ‘User:' ${TFILE} | cut -d: -f2| sed -e 's/* //'* 
DATE="date '+%s'* 


if [ -z “$HOST" ] 

then 
echo "does not appear to be an ack message" 
exit 0 

fi 


if [ -z "$MESSAGE" J 
then 

MESSAGE="Acknowledged via epager, no details given' 
fi 


#231;0-- 2= enable/disable sticky ack; 1= send notification ; \ 
O=persistant comment 

#[1089382177] ACKNOWLEDGE_SVC_PROBLEM;serverl;SSH;2;1;0;Dave \ 
Josephsen;heres my text 


echo "[${DATE}] 
ACKNOWLEDGE_SVC_PROBLEM; $ {HOST} ; {SERVICE} ;2;1;0;${USER};${MESSAGE}" >> $CMD 


echo "[${DATE}] ADD_SVC_COMMENT; $ {HOST}; ${SERVICE};1;${USER}; \ 
ACKNOWLEDGEMENT: ${MESSAGE}" >> $CMD 


rm ${TFILE} 
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Trap Customization in an Enterprise OpenView 


Operations/NNM Environment 


Andy Yuen 


any of the enterprise customers I’ve worked with manage 
a large number of servers and network devices using the 
scalability features of OpenView Operations (OVO) and 
Network Node Manager (NNM). They tend to end up with a con- 
figuration similar to that depicted in Figure 1. OVO is used as the 
manager of managers responsible for centralized event browsing 
and the management of servers with multiple NNM collection 
stations handling SNMP events from network devices. 
Management of SNMP devices is performed on the collection sta- 
tions, but OVO still needs to be informed of important events so oper- 
ations staff will be notified of potential problems to investigate and 
resolve. With so many messages coming into the OVO event system, 
unabated, the events in the browser will grow and grow making it 
almost impossible for staff to make sense of what the system is telling 
them. Consequently, mechanisms are required to automatically 
remove some of the alarms reported in the OVO event browser (1.e., 
change them from active to history events) when the alarm condition 
no longer exists. Making this happen implies the need to define a 
clearing event to every alarm event wherever possible. 


Possible Solutions 

OVO and NNM both have in-built facilities to partially address 
this problem. Let’s examine them in turn. OVO comes with a set 
of templates for system management that the administrator can 
push out to OVO agents to moni- 


* ConnectorDown 

* MgXServerDown 

¢ Pairwise 

* RepeatedEvent 

¢ ScheduledMaintenance 


The “Pairwise” circuit does exactly what we want. This pair- 
wise correlation matches a clearing (parent) event to one or more 
previously occurring alarm (child) events. Alarm events can be 
configured to: 


¢ Display in the Alarm Browser while awaiting arrival of a clearing 
event. 

¢ Display in the Alarm Browser only if the specified time window 
is exceeded without the arrival of a matching clearing event. 

* Remove from the Alarm Browser the alarm event, or set it to 
Acknowledged upon receiving a clearing event. 


Because OVO includes a copy of NNM, ECS is also available to OVO. 
Let’s examine how we can apply these built-in facilities in the 
environment described previously. For system management, the 
decision is simple: either use the pre-built templates, or create your 
own since there is only one instance of OVO running as the MOM. 
In handling SNMP traps generated by network devices or 
hosts, we have a dilemma. 


tor the servers on which they are 
running. These templates have 
already defined clearing events for 
many of the alarm conditions. 
OVO Smart Plug-ins are addi- 
tional templates/monitors that can 
be selectively installed to handle 
application monitoring such as 
Exchange server, Oracle database, 
etc. Again, these templates/moni- 
tors already have clearing events 
defined. If templates for the events 
that you want to manage are not 
included in these pre-built tem- 
plates, you must create them your- 
self using the OVO graphical user 
interface. 

NNM comes with Event 


Configuring the Pairwise circuit 
on the NNM collection stations 
will cause correlation to occur in 
the NNM collection station’s 
event browser only. It will not 
affect the OVO event browser. To 
use ECS on OVO, we must for- 
ward the SNMP traps to OVO, 
defeating the purpose of using 
collection stations for scalability 
reasons. If all traps are for- 
warded to OVO, there is no 
offloading of processing to the 
collection stations. 

Another limiting factor is that, 
for an event to appear in the OVO 
event browser, the node generating 
the event must be configured in the 


Correlation Services (ECS). ECS 

contains some pre-built circuits to avoid the occurrence of event 
storms when a main router goes down, generating of a large number 
of unreachable alarms for the downstream devices. These pre-built 
circuits include: 
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OVO Node Bank. In an enterprise 
environment, this is not always feasible. Customers may want to 
channel events from a set of related devices via a gateway node that 
has already been configured in the Node Bank. The devices generat- 
ing the SNMP traps may not be configured in the OVO Node Bank 
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because the administrator does not want to clutter the Node Bank. 
These devices may not even appear in the OVO object database by 
design (usually for scalability reasons). I will show an example in a 
later section. 


Approach Taken 

The approach taken to resolve this dilemma is to use a combi- 
nation of OVO custom template and OVO’s opcmsg command. 
The idea is to configure automatic actions in the event system on 
the NNM collection stations. The automatic actions convert the 
content of the trap to a corresponding opcmsg command and 
invoke it to send the information to OVO. The event forwarding 
process is as follows: 


1. NNM collection station receives a trap that needs forwarding to 
OVO. 

2. NNM’s event system executes the automatic action, a custom Perl 
script that packages the trap information into an opcmsg com- 
mand and executes it to forward the event to OVO. The reason to 
execute a custom Perl script as an automatic action and not exe- 
cute the opcmsg command directly is that the Perl script can do 
dynamic severity translation based on a selected trap varbind 
instead of hard-coding a fixed OpenView severity. However, the 
custom Perl script can also be configured to send the opc message 
at a fixed OpenView severity. By supporting dynamic severity 
translation, the same trap can be used as both the alarm and clearing 


Figure 1 Common enterprise OVO/NNM environment 


IP Network 


Subnets of 


Managed 
Subnets of Nodes 
SNMP Devices Subnets of 
Subnets of SNMP Devices 
SNUP Devices 


Figure 2 File upload page 
3 Trap Customization Application - Mozilla Firefox 
File Edit View Go Bookmarks Tools Help 


@-)-8 ©}. http://localhost/ovtrap/trapapp.pl 
®@ Getting Started & Latest Headlines 


Trap Customization Application - Upload File 


Upload MIB File | Select Customization Config File 


Please select your MIB file containing trap definitions 
you want to customize for upload. 


{Upload now | 


Return to the Trap Customization Application home page. 
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event depending on the severity value contained in the varbind. 
This feature is important because many MIBs do not have sepa- 
rate alarm and clearing events but rely on the severity contained 
in a particular varbind to indicate the kind of event. 

3. The ope message is sent to OVO using opcmsg. 

4. The message text prefix string (described next) triggers the cus- 
tom template. 

5. The template causes OVO’s message correlation system either to 
add the event to the OVO active event browser for an alarm event 
or to remove an existing alarm from the browser for a clearing 
event. 


The opcmsg command is available on all servers with OVO agent 
installed. You may want to look up the manpage on opcmsg for a full 
description on its command syntax. 

For our purposes, we use each field in the opcmsg command as 
follows: 


Severity — Specifies one of OVO’s severity levels: normal, warn- 

ing, minor, major, or critical. SNMP devices may define severity 

differently in their respective MIB. In such cases, the SNMP 
device’s severity must be mapped to OVO’s by the custom Perl 
script. 

* Application — Denotes the application generating the event. This 
can be assigned on a per-MIB basis. 

* Object — Used as the correlator between alarm and clearing 
events. It must be unique so that a clearing event can be matched 
to the alarm event and cause it to be removed from the OVO event 
browser. In most cases, you have to concatenate the device name 
generating the event and one or more varbind values. For exam- 
ple, for SNMP linkup and linkdown traps, the reporting node 
name and the varbind “ifIndex”, combined, will uniquely match 
these events. If the names of the device and the ifIndex are 
device456 and 3, respectively, object can be set to ““device456-3” 
to be used as the correlator. 

¢ Msg_text — Contains the text you want it to appear in the OVO 

event browser. This is specific to each individual trap. This is pre- 

fixed with a short special text string to trigger the custom template 
for message correlation (described next). Let’s define this special 
prefix string as ‘“-AaYy-”. 


Figure 3 Configuration file selection page 


\iTrap Customization Application - Mozilla Firefox 
File Edit View Go Bookmarks Tools Help 
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Trap Customization Application - Select File 


Please select trap customization contig file 
you want to edit. 
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Edit now 
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Msg_grp — The message group in which you want the message 
to appear. The message group must have already been defined in 
OVO. 

Node — The node appearing in the OVO event as the one gen- 
erating the event. The node must have already been defined in 
the OVO Node Bank. If the devices sending traps have not 
been defined in the OVO Node Bank, then the node name of 
the gateway host who forwards these events must be specified 


* Upload MIB File — If you want to customize traps from a 


particular MIB, you must first upload the MIB using this 
menu. This is the default home page if there are no MIB-spe- 
cific trapapp configuration files on the system (e.g., when the 
Web application is run for the first time). Figure 2 is the 
screenshot for this page. A MIB-specific trapapp configura- 
tion file is created once the MIB has been uploaded to the 
application. The name of the configuration file is the same as 


here. 


The custom OVO template for handling these opcmsg messages 


must perform the following: 


Identify the special short prefix text 
string in the Msg text using Message 
Key and Message Key Relation defined 
in the Message Correlation Window. 

Use the object field as the correlator for 
message correlation. 

Remove the special short prefix from 
msg_text before display. 

A less severe event cancels a more severe 
event. For example, major cancels criti- 
cal, minor cancels both major and critical, 
and so on. 


Defining the above template only requires a 
few clicks and entering the Message Key 
and Message Key Relation information for 
OVO message correlation using the OVO 
administrator’s graphical user interface for 
an OVO administrator. Consequently, I am 
going to concentrate the rest of the article 
on the generation of the appropriate opcmsg 
command and configuring the NNM event 
system. 


GUI to Generate Custom Script 
and NNM Event Configuration 

It is true that the appropriate opcmsg 
commands can be manually configured as 
automatic actions using NNM’s xnmtrap 
GUI. However, specifying an opcmsg 
command directly precludes the possibil- 
ity of dynamic severity translation based 
on the value of a trap varbind because cer- 
tain MIBs define their own severity levels 
that are quite different from OpenView’s. 
Also, owing to the complex syntax of the 
opcmsg command, it is unlikely that it can 
be coded correctly the first time. This is 
why I have developed a Web-based user 
interface to facilitate the creation of a cus- 
tom Perl script and a partial trapd.conf 
file for use in replacing the existing sec- 
tion of the trapd.conf file. This makes it 
easy to configure and customize the for- 
warding of traps using the approach 
described previously. 

The Web application is named trapapp. 
It contains four menu items: 


December 2005 


www.sysadminmag.com 


the MIB file uploaded with the file extension “.conf” 
appended. The “Edit MIB-specific Configuration” page is dis- 
played after the upload. 


AXIAR Format eliminates the need for pre-printed forms and allows robust 
formatting of raw application data streams. Ideal for invoices, purchase 
orders, reports, checks, and other business-critical documents. Ask about 
our retail and point-of-sale successes. 


@ UNIX/Linux/Windows server 
based software eliminates the 
need for costly pre-printed 
forms and DIMMs 


Automatically converting to PDF 
without operator intervention 
reduces printing costs 


<b Barcode, MICR, logo and signature 
capabilities allow users to replace 
any pre-printed form 


<} WYSIWYG form design tool 
reduces costs by accelerating 
form design and edits 

} Conditional formatting allows 
data-driven personalization to 
enhance document appearance 
and maximize information 
presentation 


@ No changes to application data 
streams are required, reducing 
impact on existing applications 


<} Compatibility with industry- 
standard PCL and PostScript 
printers allows customer flexibility 
to choose printer hardware 


Unique formfill ability allows 
OEM/application developers 
to easily integrate AXIAR with 
business-critical applications 


WYSIWYG form design tool 


145 Cherry St. 
TEL: (203) 966-0661 FAX: (203) 966-8242 
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1 
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We want you to be AXIAR ALERT! 
www.LBMsys.com 
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¢ Select Customization Configuration File — If there are MIB-spe- 
cific trapapp configuration files created by the application on the 
system, this will be the home page. You will be presented with a 
selection list to pick an existing MIB-specific trapapp configura- 
tion for modification. This page is shown in Figure 3. The “Edit 
MIB-specific Configuration” page is displayed after the selection. 
* Edit MIB-specific Configuration — This menu item is available 
only when a particular MIB file has been uploaded or when a 
MIB-specific trapapp configuration file has been selected. It dis- 
plays a page that allows you to customize each trap. You pick the 
trap you want to customize by clicking on the trap selection button 
as shown in Figure 4. This brings up the page shown in Figures 5a 
and 5b for you to enter detailed customization information. 
Generate Script and Partial trapd Configuration — Again, this 
menu item is available only when a particular MIB file has been 
uploaded or when a MIB-specific trapapp configuration file has 
been selected. It displays a page that allows you to complete any 


Figure 4 Main trap configuration page 


2 Trap Customization ‘Application - Mozilla Firefox 
File Edit View Go Bookmarks Tools Help 
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Trap Customization Application - Edit File 


Upload MIB File | Select Customization Config File 
Edit MIB-specific Configuration 
Generate Script and Partial trepd Configuration 


[Trap Number[ TrapName __ [Event Type ‘Action 


1S InetHeakthinfo hgnore 


lnetHeskthWaming —_ ignore 


inetHealthReset hgnore 


InetHealthUrgent ignore 


fnetHealthException [ignore 


lnhLiveException ignore 


InhLiveAlarm ignore [ | Configure Taeai 21 1 | 


InhLiveClearException ignore. | Configu 


InhLiveClearAlarm ignore 


lahLiveUpdateException more 
hahL veResetExceptions [ignore 
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Trap Customization Application - Edit File 


Upload MB ile MIB File | Select Customization Config File 
E fic Configuration 
dP 
Trap 21: nhLiveAlarm 
Step 1 - Select Event Type 
alarm 


> clearing 
ignore 


Step 2 - Define Message Format (Fill in this section only if event type is not ignore) 


(Special Variable/Varbind Name [Symbol 
[Agent Name 
fahServerlp 
fahServerName 


InhServerPort 


InhElementIp 
InhElementName 
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Done 


46 — Sys Admin 


www.sysadminmag.com 


additional information needed for customizing a Perl script that 
assembles and executes an opcmsg command (Figure 6). The 
customization information entered is validated nee script 
and the partial trapd.conf file are generated. If there is any miss- 
ing or inconsistent information in the MIB-specific trapapp con- 
figuration file, error messages will be displayed and no script 
and configuration files will be generated until all identified 
problems have been rectified. Figure 7 shows configuration 
problems identified by the validation process that could be diffi- 
cult to spot if you were configuring opcmsg commands manually 
using xnmtrap. The name of the partial trapd.conf file generated 
is the same as the MIB’s but with “.trapd.conf’ appended. 
Additionally, it also updates the trusted command file “trapapp” 
in the $OV_CONF/trustedCmds.conf directory. 


Implementation 


The trapapp application is implemented using Perl. It consists of 
the Perl script trapapp.pl and the Perl module MIBFSM.pm. The 
only non-standard module used is Config::IniFiles, which 
reads/writes Windows .ini-style configuration files. You have to 
download and install the Config::IniFiles module before you can 
use trapapp. Trapapp.pl implements all the form displays and gath- 
ering of information for generating the custom Perl script and par- 
tial trapd.conf file. You will notice that there are a number of files 
with a “tpl” file extension. Trapapp uses a simple templating sys- 
tem to create Web page content and the custom Perl script based on 
these template files. They are text files with special markers that 
look like: 


INAMES% 


These special markers are replaced by the values of hash entries 
with keys equal to the NAME part of the marker. The text replace- 
ment is handled by the procedure named template. 

The MIBFSM.pm module implements a finite state machine 
specifically designed for a single purpose: the extraction of V1 
TRAP-TYPE or V2 NOTIFICATION-TYPE information that 
includes trap number, trap name, and varbinds from uploaded 
MIB files in ASN.1 syntax. It does not recognize any other MIB 
definitions syntax. 


Figure 5b. Trap details configuration page continued 
Trap ‘Customization Application 3 Mozilla Firefox fc 
File Edit View Go Bookmarks Tools Help 

@:-9-8 ) ~~ http://localhost/ovtrap/trapapp.pl 
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fahComponent S20 


fabDeseription S21 


Text format: Alarm raised for $5, $11 for $12: $8 (NH:S2,$3,S6,$16,$17,S9,$10,S5) 
varbind2=$2 


€g, varbind1=$1, 


Step 3 - Define Alarm Specifics (Fill in this section only if event type is alarm) 


Correlator format: $0-S5-S16 eg, $0-$1-$2 


Select either the varbind for severity mapping or a fixed severity for the event 


Severity Varbind Fixed Severity 


nhSeverity ¥ 


Enter the severity to be mapped each OpenView severity. 
Tf more than one, separate them by a commia (,) eg, pending,unknown 


(OpenView Severity MIB-specific Severity 


jnormal normal 


fwaming warning 


hminor minor 
| as 


. 
Done 
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As you enter information to customize traps using the trapapp 
application, the information is saved in different sections of a con- 
figuration file in Windows .ini file format using the Config::IniFiles 
module. A configuration file consists of five sections: trap, format, 
alarm, clearing, and opcinfo. The format of each section is 
described below (please note that you do not have to know the 
format of the configuration file to use the application): 


[trap] 
trapnum=trapname~varbindl~varbind2~..~varbindN 


The purpose of the trap section is to record the trap name and its 
varbinds and associate that information with a trap number, for 
example: 


15=netHealthInfo~nhdErrorDate~nhdErrorTime~nhdErrorCode~nhdError \ 
Message~nhdServer I p~nhdServerName~nhServerPort~nhElementId 


Figure 6 Script generation page 


+ Trap. Customization Application - Mozilla Firefox 

File Edit View Go Bookmarks Tools Help 

@-9-G >. http://localhost/ovtrap/trapapp.pI7ID=GENERATE » © Go Gy 
® Getting Started & Latest Headlines 


Trap Customization Application - Generate 
Script 


Upload MIB File | Select Customization Config File 
Edit MIB-specific Configure 


Additional OPC-specific Information Definition 


Application Name: eHealth 


Message Group Name: eHelath 


Return to the Trap Customization Application home page. 


Andy Yuen 


Figure 7 Script generation error message page 


) Trap Customization Application - Mozilla Firefox 
File Edit View Go Bookmarks Tools Help 
e- > - & © & ~ http://localhost/ovtrap/trapapp.pl?ID=GENERATE v © Go Gy 
® Getting Started & Latest Headlines 


Trap Customization Application - Generate 
Script 


zation Configuration File 


onfiguration 


Error: message format for trap 15 references non-existent varbind $9. 
Error: correlator format references different varbinds for alarm trap 15 and clearing trap 25: nhdErrorDate vs nhServerlp 
Error: correlator format references different varbinds for alarm trap 15 and clearing trap 25: nhdErrorMessage vs 
nhResetTime 

Please fix problem(s) and try again! 


Return to the Trap Customization Application home page. 
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{format] 
trapnum=eventType~messageFormat 


where “eventType” can be either alarm, clearing, or ignore. 

The purpose of the format section is to identify the traps used for 
generating alarm and clearing events and their message format. 
Traps with eventType ignore are ignored during script generation, 
for example: 


2l=alarm~Alarm raised for $5, $11 for $12: $8 \ 
(NH:$2,$3,$6,$16,$17,$9,$10,$5) 


[alarm] 
trapnum=correlatorFormat~varbind/fixedSeverity~toNormal~toWarning \ 
~toMinor~toMajor~toCritical 


where varbind/fixedSeverity contains either a fixed OpenView sever- 
ity (one of normal, warning, minor, major, or critical) or a varbind. In 
the latter case, toNormal, toWarning, etc. contains the translation 
mapping from the varbind’s severity to that of OpenView’s. 

The purpose of the alarm section is to define the correlator for 
alarm and clearing events and translation mapping from MIB-spe- 
cific severity to that of OpenView’s: 


21=$0-$5-$16~nhdErrorMessage~normal~warning~minor~major~critical 


[clearing] 
trapnum=alarmTrapNum~fixedSeverity 
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where alarmTrapNum specifies the alarm trap this event is to clear 
and fixedSeverity specifies the fixed OpenView severity of the trap. 
For example: 


23=16~normal 


Copcinfo] 
appname=appName 
messagegroup=groupName 
nodename=nodeName or $0 


If a specific nodeName is given, it will be used as the name of the 
sender of the trap. If $0 is specified, the host name of the agent send- 
ing the trap will be specified as the source of the message. If you are 
using a gateway host to send information on devices not configured 
in the OVO Node Bank, use a fixed nodename instead of $0. 

The purpose of this section is to supply the information to com- 
plete the opcmsg command: 


appname=eHealth 
messagegroup=eHeal th 
nodename=$0 


The trapapp Web application has a configuration file named tra- 
papp.ini. Its content is shown below: 


Capp] 

scriptdir=./gen/ 

opcemsg=/opt/0V/bin/OpC/opcmsg 

trapdconf=C:/Program Files/HP OpenView/NNM/conf/C/trapd.conf 
proddir=/opt/trapapp/ 

trustedcmddir=C:/Program Files/HP 
OpenView/NNM/conf/trustedCmds.conf/ 


It consists of an [app] section and the following entries: 


scriptdir — Specifies the directory in which the generated Perl 
script and partial MIB-specific trapd.conf file are to be placed. 
The specified directory must already exist. 

¢ Opcmsg — Defines the full path of the opcmsg command. 

¢ Proddir — Specifies the directory in which the production version 
of the generated Perl script is kept. It is safer to copy the gener- 
ated script to a production directory to avoid unintentional 
changes using the Web-based GUI. This path is used to configure 
the trusted commands only. Trapapp does not perform the copy- 
ing of scripts from the scriptdir directory to the proddir directory. 
trapdconf — Defines the complete path to NNM’s trapd.conf file. 
Note that the path is different on different platforms (e.g., 
Windows and Unix). 

Trustedemddir — Specifies the trusted command directory where 
a script’s name must be listed before it will be executed as an 
automatic action. 


When you click on the Generate button, the application performs 
the following tasks: 


¢ Carries out consistency checks on the configuration information. 
If it fails the consistency check, all tasks below are skipped and 
error messages are displayed. 

Extracts the validated information from the MIB-specific config- 
uration file and creates the custom Perl script by using the 
script.tpl template file. 
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* Uses the Perl “..” operator to extract the relevant sections in the 
trapd.conf file, adds the automatic action, and saves this partial 
trapd.conf file for use by the administrator to update the produc- 
tion trapd.conf file using the xnmevents -replace command. 

¢ Adds the name of the custom Perl script to the trusted command 
list file: SOV_CONF/trustedCmds.conf/trapapp if the file does 
not exist or the Perl script name entry is absent in the file. 


In summary, the trapapp Web application is designed to reduce to a 
minimum the amount of manual configuration required on NNM 
collection stations. 


Setup and Usage 

I assume here that the reader is familiar with the configuration of 
a Web server, creating OVO templates, and configuring NNM auto- 
matic actions using the respective OVO and NNM tools. If so, then 
setting up the trapapp Web application is simple. All it does is gen- 
erate a custom Perl script and a partial trapd.conf file for each MIB 
you uploaded. To set up trapapp, complete the following steps: 


1. Download and install the Config::IniFiles Perl module either 
from CPAN or ActivePerl. 

. Unzip (on Windows) or jar xvf (on Unix assuming you have 
Java 2 SDK installed) the trapapp.zip file into its own directory 
named trapapp (any name will do). 

3. Configure a virtual directory on the Web server named ovtrap to 

point to that directory. 

4. Configure default document on the Web server as index.html. 


i) 


Invoke the trapapp application by using the URL: 
http://localhost/trapapp/ 


You can now start experimenting on the trapapp application and 
examine the generated partial trapd.conf file. Once you are comfort- 
able with the generated partial trapd.conf file. You can update 
NNM’s trapd.conf file with the information contained in the gener- 
ated partial trapd.conf file using the NNM command xnmevents: 


Xnmevents -replace partialConfigFile 


where partialConfigFile is the full path name of the generated par- 
tial trapd.conf file. 

It is trivial to include a menu item in the Web application to do 
this. I do not implement it because providing such a convenience 
feature makes it too easy to change the production trapd configura- 
tion unintentionally while experimenting with the Web GUI. It is 
exactly for the same reason that the trusted command list contained 
in the $OV_CONF/trustedCmds.conf/trapapp is only updated if the 
generated Perl script name is absent in the file. 

Additionally, there is a one-off setup involving the following 
tasks on the OVO Server: 


1. Define OVO template that implements the message correlation 
facility described earlier. 


2. Distribute the template to the OVO agent on the OVO server. 


When things do not work as expected, the following checklist may 
help you resolve the problems: 


1. Does the information in the trapapp.ini file correctly reflect your 
environment? 
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2. Has the message group specified in the opcmsg command been 
defined in OVO? 

3. Has the node name used in the opcmsg command been configured 
in the OVO Node Bank? 

4. Has the MIB you uploaded to trapapp been loaded using NNM’s 
xnmloadmib tool? 

5. Have you updated the trapd.conf file by using the xnmevents 
command as described previously? 

6. Is the path information contained in OV_CONF/trustedCmds.conf/ \ 
trapapp for the custom Perl scripts correct? 

7. Has the generated Perl script been moved to the production direc- 
tory and made executable (on Unix platforms only)? 


An Example 


If you use Concord eHealth in your organization, it is likely that 
you will have installed eHealth Integration for OpenView on one or 
more of your NNM collection stations. The installation includes 
loading the concord-diagmon.mib and setting up eHealth-specific 
menu items in the OpenView graphical user interface. It is quite 
common that not all servers managed by eHealth are configured in 
the OVO Node Bank or even in the OpenView object database. The 
approach described in this article handles this situation. 

An example configuration for the concord-diagmon.mib is set 
up when you installed trapapp. In this sample configuration, trap 
21 (nhLiveAlarm) and 23 (nhLiveClearAlarm) have been set up 
as alarm and clearing events, respectively. The correlator used is 
$A-$5-$16 (i.e., agentHostName-nhElementIp-nhExceptionld). 
nhLiveAlarm uses content of the varbind nhSeverity as the severity 


Hurric 1Ne_bie6 


(&) he.net ' 7 


to use in the opsmsg command. In this case, the possible values of 
nhSeverity are the same as those for OpenView. 


Limitations 


Please note that this Web application is intended for OpenView 
administrators’ personal use only. It is not intended to be used by 
the casual OpenView user. No security or concurrent access con- 
trol has been implemented. Hence, if more than one user is updat- 
ing the configuration for the same MIB at the same time, the result 
is unpredictable. 


Conclusion 


The trapapp Web application presented in this article provides 
a convenient way to customize relevant incoming traps to NNM 
collection stations and convert them into ope messages for for- 
warding to OVO for display and message correlation. It also helps 
to ensure the quality of the trap customization due to its extensive 
check on consistency of the configuration. You can find trapapp 
updates/fixes and other free tools under “Free Tools” on the Web 
site of Kardinia Software at: http://www. kardinia.com. If you 
have enhanced and/or fixed any bugs in trapapp, please send me a 
description and your changes so that I can post them on the 
Kardinia Web site. 


Andy Yuen is a solutions architect who specializes in application development 
and system/network management. He has a master’s degree in Electrical 
Engineering from Carleton University, Canada. He can be contacted at: 
andyyuen@ozemai].com. au. 
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John & Ed’s Scripting Screwups 


e believe that you don’t learn anything or do something 

without making mistakes. Because this is our last col- 

umn of the year, we’d like to discuss this year’s blunders 
and enhancements: 


¢ We enhanced the Is_floppy script that was part of our 
“Manipulating Floppy Disk Images on Solaris” column. 

¢ We plugged a security hole in the share screen section of our 
“Using Screen in Scripts” column. 

¢ We fixed an exit code/return value bug in our semaphore script as 
described in “Queuing Jobs with qjob”. 

¢ We changed the pushd/popd functions as presented in Bill 
Rosenblatt’s Learning the Korn Shell. This isn’t really our bug, 
but it keeps with the theme of the column. 


Floppy Enhancement 


Our “Manipulating Floppy Disk Images on Solaris” column pre- 
sented three shell scripts for managing the floppy drive. Originally, 
the scripts were developed with Solaris 7. When upgrading to 
Solaris 9, the ls_floppy script failed because one of the floppy com- 
mands, volrmmount, wasn’t making a temporary directory quickly 
enough. The script’s next command was executing before the direc- 
tory existed. 

To fix the problem, we inserted a five-second sleep after execut- 
ing the volrmmount command and before executing the change 
directory command: 


sleep 5 

if cd $dir 

then 
# list contents of the floppy 
ls $options 


Using Screen Security Problem 


Our major blunder of the year was in the April, “Using Screen in 
Scripts” column. Sharp-eyed readers Kevin Turner and Rod 
Knowlton pointed out a security hole in the “Using sudo in Share 
Screens” sections. Our share-screen script allows users to share 
screens. Unfortunately, when sharing a root-generated screen, sim- 
ply by pressing “Ctrl-A c’”, another screen is generated with root 
access — very undesirable. 

To solve our share screen problem, we have abandoned screen. 
Presently, we are planning to use the Expect script kibitz. The kibitz 
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script, bundled with the new versions of expect, allows two users to 
interact with one shell. The kibitz script is easy to use. From the kib- 
itz MAN page: 


To start kibitz, user! runs kibitz with the argument of the user to 
kibitz. For example: 


kibitz user2 


kibitz starts a new shell (or another program, if given on the 
command line), while prompting user2 to run kibitz. If user2 
runs kibitz as directed, the keystrokes of both users become the 
input of the shell. Similarly, both users receive the output from 
the shell. 


Finally, the online screen documentation link in our original column 
is dead; see the reference section for a replacement. 


Return Code Bug 


In the “Queuing Jobs with qjob” column, we reported fixing a 
bug in our semaphore script but didn’t mention what that bug was. 
The bug is obscure enough that we feel compelled to review it. 

The gist of the problem is that we misidentified the exit 
code/return value of a function when used in conjunction with the 
negation operator: 


if ! keep_trying 
then 

exit $? 
fi 


We assumed that if the keep_trying function returns a value 
greater than zero, the not comparison evaluates to true, and the 
exit code equals the keep_trying function’s return value. 
Unfortunately, this is a bad assumption. The negation operator 
resets the exit code masking whatever the keep_trying function’s 
return value was. 

Consider this example: 


keep_trying() 
{ 
return 5 
} 
if ! keep_trying 
then 
echo $? # echo's 0 
f 
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Because keep_trying’s return value is false, the negation operator 
sets the exit code to 0, true, or not false. 

Remember that negation returns true when evaluating not false 
and false when evaluating not true. If this statement is confusing, 
execute the following comparisons: 


if ! false 
then 

echo $? # echo's 0 
fi 


if ! true 
then 


else 
echo $? # echo's 1 
fi 


If you really want the function’s return value, do not use negation, 
as in this example: 


keep_trying() 
{ 
return 5 


} 
if keep_trying 


then 

echo "true: $?" 
else 

echo "false: $2" # echo's 5 
fi 


The pushd & popd Problem 


We like Bill Rosenblatt’s Learning the Korn Shell, but we dis- 
agree with his implementation of pushd and popd. The pushd and 
popd functions “enable you to move to another directory temporar- 
ily and have the shell remember where you were”. (Rosenblatt’s 
functions are a subset of the C shell functions of the same name.) 

Basically, executing a push with an argument saves the argument 
and changes to that destination; executing a pop enough times — no 
matter how many times push was executed — changes to the origi- 
nal directory. 

Rosenblatt emulates pushing and popping directory names by 
storing them in shell variable DIRSTACK. When implementing 
Rosenblatt’s functions, we discovered the following three issues: 


1. In the pushd function, every argument pushed onto the stack must 
be delimited by a space in order to pop it off: 


DIRSTACK=$ {DIRSTACK#* } 


Unfortunately, the first argument pushed onto the stack is not 
space delimited; thus, the first argument never gets popped. 

. In the pushd function, Rosenblatt executes a change directory 
before popping, which incorrectly initializes the stack. This 
causes the stack not to return to the original directory. 

3. In the popd function, Rosenblatt checks for a null stack before 
changing to the destination directory. Ultimately, this executes a 
change directory command with a null argument placing users 
back in their home directory, which is probably not the original 
directory. 


Nw 
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Setting the Environment 


In the rest of the column, we’ll explain our fixes to Rosenblatt’s 
functions. Rosenblatt presents his solutions as functions. For 
demonstration purposes, we are presenting his code (Listing 1, 
br_pushd, Listing 2, br_popd) and our solution (Listing 3, 
our_pushd, Listing 4, our_popd) as scripts. 

We’ ve also created four aliases that easily source the scripts: 


br_pushd=". /home/jspurgeo/sandbox/br_pushd' 
br_popd='. /home/jspurgeo/sandbox/br_popd' 
our_pushd='. /home/jspurgeo/sandbox/our_pushd' 
our_popd='. /home/jspurgeo/sandbox/our_popd' 


To ease debugging, the following alias prints the DIRSTACK shell 
variable surrounded by double-quotes: 


print_stack="echo DIRSTACK=\"$DIRSTACK\"" 


The Terminating Space Problem 


Assuming our original directory is /home/jspurgeo/sandbox, 
using Rosenblatt’s example, push a directory onto the stack: 


$ br_pushd /etc 
/etc /etc 


Executing multiply br_popd calls verifies that the stack never emp- 
ties and never returns to the original directory. Execute the 
print_stack alias to view the stack: 

DIRSTACK="/etc" 

The Korn shell expression requires a space at the end of the first 
argument. In the pushd function, to obtain the terminating space, 
change this code: 

DIRSTACK="$dirname ${DIRSTACK:-$PWD}" 

to: 

DIRSTACK="$dirname ${DIRSTACK:-$PWD }" 


With this change, execute the test again: 


$ br_pushd /etc 
/etc /etc 


Execute br_popd, view the stack with print_stack, and verify a 
space exists: 


DIRSTACK="/etc " 


Now, execute br_popd twice, and note receiving a “stack empty” 
message, and that a call to print_stack displays an empty stack: 


DIRSTACK="" 


The Stack Initialization Problem 


You probably noticed that although the stack empties, it doesn’t 
ultimately return to the original directory. This happens because the 
stack is initialized with the wrong directory. Simply flip the follow- 
ing two lines: 
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cd $dirname 
DIRSTACK="$dirname ${DIRSTACK:-$PHD }" 


to read: 


DIRSTACK="$dirname ${DIRSTACK:-$PWD }" 
cd $dirname 


Now, execute the push test again: 


$ br_pushd /etc 
/etc /home/jspurgeo/sandbox 


Note that the stack reads “/etc /home/jspurgeo/sandbox” where 
before the output read “/etc /etc”. 


The Null Stack Check Problem 


Running the push/pop test again shows the user ultimately back 
in his home directory: 


$ pwd 
/home/jspurgeo/sandbox 


$ br_pushd /etc 
/etc /home/jspurgeo/sandbox 


$ br_popd 
/home/ jspurgeo/sandbox 


Serving all flavors. 


ff 
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$ print_stack 
DIRSTACK="/home/jspurgeo/sandbox " 


$ br_popd 
/home/jspurgeo 


$ br_popd 
stack empty, still in /home/jspurgeo. 


Ending in the user’s home directory happens because the Unix cd 
command is executed with a null argument when the stack empties. 
This happens because Rosenblatt performs the stack null check 
before popping an argument from the stack: 


Listing 1 Bill Rosenblatt’s pushd 
#!/bin/ksh 


dirname=$1 

if [{ -d $dirname && -x $dirname ]]; then 
cd $dirname 
DIRSTACK="$dirname ${DIRSTACK:-$PWD}" 
print "$DIRSTACK" 

else 
print "still in $PWD." 

fi 

# end listing 1 


Listing 2 Bill Rosenblatt’s popd 
#1 /bin/ksh 


if [{ -n $DIRSTACK J]; then 
DIRSTACK=$ {DIRSTACK#* } 
cd ${DIRSTACK%% *} 
print "$PHD" 
else 
print "stack empty, still in $PWD." 
fi 
# end listing 2 


Listing 3 Our edited pushd 
#!/bin/ksh 


dirname=$1 

if [[ -d $dirname && -x $dirname ]]; then 
DIRSTACK="$dirname ${DIRSTACK:-$PWD }" 
cd $dirname 
print "$DIRSTACK" 

else 
print "still in $PWD." 

fi 

# end listing 3 


Ee ene nn ee Ue | 
Listing 4 Our edited popd 
#!/bin/ksh 


DIRSTACK=${DIRSTACK#* } 
if [[ -n $DIRSTACK J}; then 
cd ${DIRSTACK%% *} 
print "$PWD" 
else 
print "stack empty, still in $PWD." 
fi 
# end listing 4 
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if [[ -n $DIRSTACK J]; then 
DIRSTACK=${DIRSTACK#* } 


Fix the problem by popping the argument before performing the 
null check: 


DIRSTACK=$ {DIRSTACK#* } 
if [CL -n $DIRSTACK J]; then 


Perform the push/pop test again and verify returning to the original 
directory: 


$ pwd 
/home/jspurgeo/sandbox 


$ br_pushd /etc 
/etc /home/jspurgeo/sandbox 


$ br_popd 
/home/jspurgeo/sandbox 


$ br_popd 
stack empty, still in /home/jspurgeo/sandbox. 
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hanks to Ed Schaefer for pointing out a typographical error 

in the answer regarding csp]it in the October issue. In each 

of the following lines, the backtics were converted to single 
quotes when the issue went to print: 


numrecs= grep -c "*${recsep}" $1° 
splitcount= expr ${numrecs} - 2° 
for ifile in “ls data**; do 
ofile=head -1 ${ifile}* 


Dennis Lang submitted a one-line awk solution to the same 
question. I’ve slightly modified the code he submitted to remove an 
UUOC and prevent the matching of false positives on the record 
separator: 


awk '/*xxx[0-9].*$/{f=$1l;next;} {print >>f;}' input-file 


We’re running the stock Sun SSH that comes with Solaris 9, 

and we enabled AllowTcpForwarding and X11Forwarding in 
sshd_config. After HUPing sshd, new connection attempts 
authenticate and then fail. /var/adm/messages includes lots of 
errors that say: 


Sep 19 13:54:44 hostname sshd[6523]: [ID 800047 auth.error] \ 
error: Failed to allocate internet-domain X11 display socket. 


Commenting the two entries back out again fixes the issue but, 
unfortunately, we can’t really go without X11 forwarding. Is there a 
workaround to this issue? 


You’ve run into another one of Sun’s SSH bugs, this time with 

X11Forwarding. In this case, SPARC patch 118305-04 and x86 
patch 117470-03 are at fault. If you back out of whichever patch is 
installed on your system and install SPARC patch 118335-04 or x86 
patch 120463-01, it should fix your problem. Sun documented this 
issue in infodoc 101834: 


http://sunsolve.sun.com/search/document.do?assetkey=1-26-101834-1 


Other people have suggested starting sshd in IPv4 mode only by 
editing the sshd_config file and specifying: 


ListenAddress 0.0.0.0 


Submit questions to: http: //www.samag.com/columns/questions/ 
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Amy Rich 


We’re getting a rash of users who are reporting that they can no 
longer use POP to send and receive their mail. The users report 
the following error from their clients: 


Your server has unexpectedly terminated the connection. Possible causes 
for this include server problems, network problems, or a long period of 
inactivity. Account: 'username@our.domain', Server: 'mail.our.domain', 
Protocol: POP3, Server Response: '+0K 2019 octets', Port: 110, 
Secure(SSL): No, Socket Error: 10053, Error Number: Ox800CCCOF 


The common link with all these users seems to be that they’re scan- 
ning their messages with Norton AntiVirus. The only fix we’ve been 
able to suggest so far has been to turn off input message scanning. 
We'd like a better answer for our users so they can re-enable message 
scanning. Any suggestions? 


The problem you’re running into is that Norton stealthily 

breaks TLS encryption between the client and server so it can 
scan the now-non-encrypted messages. It should be implemented to 
scan the messages before making an outgoing SMTP connection 
instead of after the connection is made, but that’s not the way the 
software was designed. Your users can use POP over SSL on port 
995 to retrieve mail (or IMAP SSL on port 993 if they want to 
switch to IMAP) and the submission port (587) to submit mail. 
Norton only scans messages on ports 110 (pop3) and 25 (smtp). 
This means, of course, that the messages are still not being scanned, 
but Norton will not complain. For a real scanning solution, you’ll 
need to switch to another product. 


I just started a new job where I’m the only Unix systems 

administrator. I’ve been trying to gather information about all 
of the new machines, and I’m running into a problem with one spe- 
cific host. Two of the file servers I’ve inherited as part of my new 
position are identically configured E450s and fully populated with 
disks. One of these machines shows all 20 disks when I run prtdiag 
-V, but the other only shows disks 0 through 3. I know that the disks 
are functioning just fine because they’re in use on the fileserver. So, 
why can’t I see them all? 


The default Ultra Enterprise 450 configuration only sup- 
ports four disk drives connected to the internal backplane. 
To support the 20 drives in your system, two 8-bay storage 
expansion kits were installed as an upgrade. As part of the 
upgrade, you need to set a variable, disk-led-assoc, in the 
OBP to set up the mapping between disk slots and the physical 
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and logical device names. This is covered as part of Sun infodoc 
16735: 


http://sunsolve.sun.com/search/document.do?assetkey=1-9-16735-1 
From the OBP, you need to run: 
setenv disk-led-assoc 0 x y 


where X is an integer between | and 10 identifying the rear panel 
PCI slot number where the lower UltraSCSI controller is installed, 
and y is an integer between | and 10 identifying the rear panel PCI 
slot number where the upper UltraSCSI controller is installed. Slot 
0 is the internal controller. If the other controller cards are installed 
in slots 5 and 7, the command would be: 


setenv disk-led-assoc 057 


Once you set this variable, reset the system and then do a reconfigu- 
ration reboot with boot -r. 


Our boot disks are encapsulated using SVM under Solaris 9, 

and we have an external RAID 5 set attached. We’ve somehow 
managed to hose things quite spectacularly, and we need to boot 
from the JumpStart image on the network to try and repair things. 
This would be easy if we just ripped out SVM and booted off one of 
the unencapsulated drives, but we need to be able to access the 
RAID 5 device. Unfortunately, the JumpStart image doesn’t recog- 
nize SVM devices. I’m sure there must be a way around this, but 
I’m not sure how to make the JumpStart image read the RAID 5 
device. Can you offer any suggestions? 


Information on how to access a RAID 5 stripe set while boot- 
ing off the CD-ROM is covered in Sun infodoc 75210: 


http://sunsolve.sun.com/search/document.do?assetkey=1-25-75210-1 


The procedure for accessing it from a network boot is pretty much 
the same. To begin, boot single user mode off the network 
JumpStart image (this assumes that net is the network where your 
JumpStart image is located): 


boot net -s 
Determine the id of the the SVM metadevice driver: 


# modinfo | grep md 

17 11be592 2d1b3 85 md (Solaris Volume Manager base mod) 
6 7824c000 d0c5 
7 7823c000 ed04 
8 7825a000 2a03' - md_hotspares (Solaris Volume Manager hot spar) 
9 78178000 4c3c md_sp (Solaris Volume Manager soft par) 

50 139f480 5498 md_stripe (Solaris Volume Manager stripes ) 
51 13a448c 12006 md_mirror (Solaris Volume Manager mirrors ) 
68 134adfd 107d - md5 (MD5 Message-Digest Algorithm) 

246 7819f1d7 1004 md_notify (Solaris Volume Manager notifica) 


> 


md_trans (Solaris Volume Manager trans mo) 


> 


md_raid (Solaris Volume Manager raid mod) 


= 


Then unload the Solaris Volume Manager base module: 


modunload -i 17 
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Once you’ve unloaded the module, mount one of the unencapsu- 
lated boot devices (the directions below assume that your root 
filesystem is on cOt0d0s0) and copy the metadevice driver configu- 
ration over to the running OS: 


mount -r /dev/dsk/cOt0d0s0 /a 
cp /a/kernel/drv/md.conf /kernel/drv/md.conf 
umount /a 


Now reload the md driver. This time it will read the information you 
copied from your boot disk: 


modload /kernel/drv/md 
metasync -r 


All of your original metadevice information should be available to 
commands like metastat and metadb now, and you should be able 
to mount the RAID 5 filesystem under /a. 


We're running a pretty vanilla Apache 1.3.33 on a load-bal- 

anced set of FreeBSD 5.3 servers. We need to schedule some 
site-wide downtime so we can shuffle a large amount of data around 
behind the scenes. While we’re down, we want to leave one server 
up, but redirect all traffic to a “we’re down right now, please come 
back after 9:00AM” sort of page. I was going to be clever about this 
and just set the ErrorDocument to this page, but I realized that the 
page requires an image as well as the text. This means I need to 
make allowances for more than one URL that does not redirect. 
What’s the best way to do this? 


Probably the easiest way is to use the RewriteEngine 

instead of Redirect or RedirectMatch. Say you’ve replaced 
index.html with the maintenance page and that includes the 
image maintenance.png, you’d have a set of rewrite rules like the 
following: 


RewriteEngine On 


RewriteRule 4/$ > TL] 

RewriteRule s/index\.html$ - [L] 

RewriteRule “/maintenance\.png$ - [Ll] 
RewriteRule “7.*$ http://www.your.domain/ [R] 


Be sure to comment out any other rewrite or redirect rules so you 
don’t have conflicts. 


We’re running a bunch of Solaris 9 machines that have inter- 

faces on both a public and a private network. For perfor- 
mance and security reasons, we’re performing non-encrypted file 
transfers using rsync over the protected network. When we first 
started configuring this, we ran into an issue where we didn’t 
think things were working because we couldn’t get rsh to the 
machine to work. After a bunch of debugging, we discovered that 
rsh with no arguments just hung, but if you gave the rsh argu- 
ments, it worked fine (and subsequently we were able to get 
rsync over rsh working fine, too). Even though we got our 
immediate problem solved, I still want to know why rsh with no 
arguments fails because we wasted so much time debugging what 
turned out to be a non-issue. 
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The rsh command is designed to connect to a target machine 

and execute the specified command. If you don’t specify a 
command when you initiate the connection, then you wind up 
exec’ing rlogin on the local machine instead of rsh. If you’d run a 
truss of the rsh process, you would have seen lines resembling the 
following, where rlogin replaces the rsh process: 


execve("/usr/bin/rlogin", OxFFBFFBO4, OxFFBFFB10) argc 
resolvepath("/usr/bin/rlogin", "/usr/bin/rlogin", 1023) = 15 
resolvepath("/usr/lib/Id.so.1", “/usr/lib/ld.so.1", 1023) = 16 
stat("/usr/bin/rlogin", OxFFBFF8D0) = 


il} 
~ 


If you’re seeing the rsh session just hang, there’s a good possibility 
that you’ve commented out the rlogin entry from /etc/inetd.conf 
and you’ve just left rsh enabled. Try uncommenting the entry for 
rlogin and see whether this fixes your problem: 

nowait root /usr/sbin/in.rlogind in.rlogind 


login stream tcp6 


We’re trying to install Oracle 10g on Solaris 9, but we keep 
failing the section of the validate test that deals with kernel 
parameters: 


Rule [ 170 ]: Kernel params OK? 


This rule verifies if the kernel parameters have been set according to 
the installation manual 


Test [ FAILED ] : 

SHMMAXUndef 

SHMMNIUndef 

SEMMNIUndef 

SEMMSLUndef 

SEMMNSUndef 

SEMVMXUndef =~ Kernel0K| Obsoleted 


Action: 


The kernel parameters have NOT been set according the installation 
manual of 10g RDBMS. Please refer to the installation manual. 


ReturnValue Action 
SHMMAXTooSmall Increase the kernel parameter SHMMAX to 4294967295 
SHMMAXUndef SHMMAX has not been defined and needs to be set 

to 4294967295 
SHMMINTooSmal] Increase the kernel parameter SHMMIN to 1 

- ignore this message if your 0S is Solaris 9 
SHMMINUndef SHMMIN has not been defined and needs to be set 

to 1 - ignore this message if your 0S is Solaris 9 
SHMMNITooSmall Increase the kernel parameter SHMMNI to at least 100 
SHMMNIUndef SHMMNI has not been defined and needs to be set to 100 

or more 
SHMSEGTooSmall Increase the kernel parameter SHMSEG to 10 

- ignore this message if your 0S is Solaris 9 
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SHMSEGUndef SHMSEG has not been defined and needs to be se 

to 10 - ignore this message if your 0S is Solaris 9 
SEMMNITooSma Increase the kernel parameter SEMMNI to 100 
SEMMNIUndef SEMMNI has not been defined and needs to be set to 100 
SEMMSLTooSma Increase the kernel parameter SEMMSL to at least 100 
SEMMSLNotDef SEMMSL has not been defined and needs to be set to 100 
SEMMNSTooSma Increase the kernel parameter SEMMNS to at least 256 
SEMMNSUndef SEMMNS has not been defined and needs to be set to 256 
SEMOPMTooSma Increase the kernel parameter SEMOPM to at least 100 
SEMOPMUndef SEMOPM has not been defined and needs to be set to 100 
SEMVMXTooSma Increase the kernel parameter SEMVMX to 32767 
SEMVMXUndef SEMVMX has not been defined and needs to be set to 32767 


NOEXEC_USER_STACKTooSmal] Increase the kernel parameter 
NOEXEC_USER_STACK to 1 - ignore this message if your 0S 
is Solaris 9 
NOEXEC_USER_STACKUndef NOEXEC_USER_STACK has not been defined and 
needs to be set to 1 - ignore this message if your 0S is 


Solaris 9 
NoAccess You do not have access to /etc/sysdef 
Obsoleted With Solaris 10 most shared memory and semaphore 


settings are now obsolete. Consult sunsolve.sun.com and 
documentation for System Admins on Solaris 10 for details 


It says we’re missing settings for shmmax, shmmni, semmni, semms1, 
and semmns, but we have the following defined in /etc/system: 


* Settings for oracle 

set noexec_user_stack=1 

set semsys:seminfo_semmni=100 
set semsys:seminfo_semmns=1024 
set semsys:seminfo_semms]=256 
set semsys:seminfo_semvmx=32767 
set shmsys:shminfo_shmmax=4294967295 
set shmsys:shminfo_shmmin=1 

set shmsys:shminfo_shmmni=100 
set shmsys:shminfo_shmseg=10 

* End settings for oracle 


If these aren’t the settings they want, what should we be using? 


You have the correct settings in /etc/system, but I suspect 

you're attempting to validate the installation before you actu- 
ally start Oracle (as you should be). The problem you’re running 
into is the way kernel modules function. Under Solaris, kernel mod- 
ules are not loaded until they’re actually needed by an application. 
When you run the validation test before starting Oracle itself, the 
shmsys and semsys modules remain unloaded and the test fails. You 
can either let it fail during the validation phase (and it will work 
after Oracle starts and the modules are loaded), or you can be rid of 
the validation warnings by forceloading the modules at boot time. If 
you’d rather perform the latter, add the following two lines to 
/etc/system and reboot: 


forceload: sys/shmsys 
forceload: sys/semsys 


Amy Rich has more than a decade of Unix systems administration experi- 
ence in various types of environments. Her current roles include that of 
Senior Systems Administrator for the University Systems Group at Tufts 
University, Unix systems administration consultant, and author. She can be 
reached at: qna@oceanwave.com. 
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GIAC Prep Teaching Kits 


GIAC Prep Practice Tests 


Information subject to change. For the most up to date information please see www.sans.org 
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Peregrine Systems Unveils Enterprise Discovery 2.0 


Peregrine Systems, Inc. released Enterprise Discovery 2.0, a 
product suite that enables IT to automatically discover, inventory, 
and manage all hardware, enterprise software, open source and net- 
work devices. According to the company, Enterprise Discovery 2.0 
captures detailed, cross-platform configuration information that 
populates Peregrine’s Active CMDB. With out-of-box integration 
to Peregrine’s IT asset and service management solutions, 
Enterprise Discovery 2.0 provides critical configuration item status 
validation required for ITIL-based processes. 

According to the company, Enterprise Discovery 2.0 is a highly 
scalable solution capable of discovering more than 50,000 devices 
per server and supporting up to 500,000 devices per server group. 
Enterprise Discovery 2.0 provides software inventory scanners for 
AIX, HP/UX, Solaris, Linux, Windows, and legacy platforms. 
Discovered files are reconciled against a library of more than 
12,000 known software titles. Enterprise Discovery 2.0 is a core 
component of Peregrine’s Asset Tracking, Expense Control, and 
Service Establishment solutions. 

For more information, visit: http://www.peregrine.com. 


ControlTower Console Manager for Linux 
Platforms Available 


Carlo Gavazzi Computing Solutions released ControlTower 
Console Management System, Version 3.L for Linux platforms. 
According to the company, ControlTower 3.L is solution for moni- 
toring and controlling multiple devices through an RJ-45 or DB-25 
interface from a central location or by remote access. It enables a 
single Linux-based system to function as a common console (monitor 
and keyboard) for managed devices. ControlTower 3.L is integrated 
into the distributions of Mandriva, Red Hat, and SUSE. 

The company says ControlTower 3.L offers security options, error 
handling, and logging features for greater customization and flexibil- 
ity. ControlTower 3.L uses Pluggable Authentication Modules (PAM) 
to authenticate users. ControlTower 3.L allows systems administrators 
to configure security at the level of the individual user. To ensure that 
sensitive data is not accessible over the network, ControlTower 3.L 
allows remote users to exchange data with the server using 128-bit 
Twofish encryption. System access also can be restricted by IP 
address, password, or Unix account. Systems administrators can 
create and maintain managed device configuration files in a group. 

Pricing for the ControlTower 3.L software starts at $800. For 
more information, visit: http://www.gavazzi-computing.com. 


Sybase Introduces ASE 15 


Sybase, Inc. announced the latest version of its enterprise-class 
relational database management system, Adaptive Server Enterprise 
(ASE) 15. According to the company, Sybase ASE is a cost-effective 
data management platform for business-critical computing. Sybase 
ASE 15 extends management of unstructured and semi-structured 
data with advanced XML capabilities, and improved ease of XML 
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document generation and schema validation. ASE 15 also includes 
extended services for directly accessing unstructured data in operating 
system files, and is now bundled with the core server. 

The new features in ASE 15 include: 


* Computed columns for easier application design 

* Function indexes for increased query and field-handling performance 

¢ Messaging services that allow application developers to build 
event-driven applications 

° Large server support (VLSS) to manage large data sets 

e Advanced XML technology to natively store and process XML 
documents 

¢ Scrollable cursors 

* Auto update statistics 

¢ Job wizards that are helpful for novice administrators 

¢ Advanced system metrics 


ASE 15 editions range from a free Express Edition to the Enterprise 
Family of products. The Express Edition is available for Linux 
users and supports up to 5GB of data supported by 2GB of memory 
and 1CPU. The Small Business Edition starts at $1,495. Adaptive 
Server Enterprise 15 runs on Linux, Windows, Solaris, HP-UX, and 
AIX operating systems. 

For more information, visit: http://www.sybase.com/. 


Cyclades Unveils AlterPath OnBoard 


Cyclades Corporation announced the AlterPath OnBoard, a ser- 
vice processor manager for out-of-band management of servers. 
According to the company, the AlterPath OnBoard with Secure 
Rack Management consolidates and securely manages service 
processor technologies available in today’s next-generation servers. 
The AlterPath OnBoard delivers SRM by isolating and protecting 
the connected servers from the production network to provide 
secure, efficient rack-level management with seamless integration 
into the Out-of-Band Infrastructure (OOBJ). 

According to the company, the AlterPath OnBoard combines 
hardware and software into an appliance designed to act as a protocol 
gateway and enable the secure integration of servers so they can be 
operated in a consistent manner, regardless of protocol type. The 
AlterPath OnBoard physically consolidates and logically secures the 
Ethernet connections needed for access to service processors embed- 
ded in servers and blade servers offered by all major system vendors, 
including Dell, Hewlett-Packard, IBM, and Sun Microsystems. With 
the AlterPath OnBoard, systems administrators can locally or 
remotely perform operations such as power cycle, remote console 
access, hardware monitoring and management, and event manage- 
ment across servers using one simplified user interface. 

The AlterPath OnBoard will be available in December 2005 with 
pricing starting at approximately $3,000. For more information, 
visit: http://www.cyclades.com. 
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Microway's FasTree™ DDR InfiniBand c 
switches run at 5GHz, twice as fast as 
the competition's SDR models. 2 
FasTree's non-blocking, flow-through re 
architecture makes it possible to create 

24 to 72 port modular fabrics which 

have lower latency than monolithic switches. They 

aggregate data modulo 24 instead of 12, improving nearest neighbor 

latency in fine grain problems and doubling the size of the largest three hop fat tree 
that can be built, from 288 to 576 ports. Larger fabrics can be created linking 576 port domains together. 
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4 72 Port FasTree™ Configuration 


Working with PathScale's InfiniPath HTX Adapters, the number of hops required to move MPI messages 
between nodes is reduced, improving latency. The modular design makes them useful for SDR, DDR and 
future QDR InfiniBand fabrics, greatly extending their useful life. Please send email to fastree@microway.com 
to request our white paper entitled Low Latency Modular Switches for InfiniBand. 


Microway' Ss QuadPuters ireudes four AMD. single 0 or dual core ¢ Opteron" “processors, 4350 Wai red nda 
power supply, and up to 5 redundant, hot swap hard drives-all in 4U. One of the most powerful 
platforms i in the HPC industry, QuadPuter can serve as a cluster node or a standalone SUSreavis te oO 
Constructed with stainless steel, its RuggedRack™ architecture is designed to keep the processors and 
memory running cool and efficiently. The power supply exhaust does not mix with air in the motherboard 
chamber. Hard drives are cooled with external air and are front-mounted along with the power supply for 
easy access and removal. The RuggedRack” is available with an 8-way motherboard, eWer -core @ Opterons 
and up to 128 GB of pane for Hower: and memory-hungry SMP alot 


Visit us at SC2005 Seattle = 
<4 QuadPuter® Navion™ with Hot Swap, Redundant Power & Hard Drives 
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A Technology you can count on” 


Accelerating 
pints Partenmnaae” 508.746.7341 microway.com 
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